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^ Abstract 

We present a near linear time algorithm for constructing hierarchical nets in finite metric 
^ ■ spaces with constant doubling dimension. This data-structure is then applied to obtain improved 

I algorithms for the following problems: Approximate nearest neighbor search, well-separated pair 

■ decomposition, spanner construction, compact representation scheme, doubling measure, and 

I computation of the (approximate) Lipschitz constant of a function. In all cases, the running 

(preprocessing) time is near-linear and the space being used is linear. 

Q ■ 1 Introduction 
O 



Given a data set, one frequently wants to manipulate it and compute some properties of it quickly. 
For example, one would like to cluster the data into similar clusters, or measure similarity of items 
in the data, etc. One possible way to do this, is to define a distance function (i.e., metric) on the 
' data items, and perform the required task using this metric. Unfortunately, in general, the metric 

■ might be intrinsically complicated ("high dimensional"), and various computational tasks on the 
data might require high time and space complexity. This is known in the literature as "the curse 

Q I of dimensionality" . 

■ One approach that got considerable attention recently is to define a notion of dimension on a 
finite metric space, and develop efficient algorithms for this case. One such concept is the notion 
of doubling dimension |2l [23 ESI • "^^^ doubling constant of metric space M. is the maximum, over 
all balls b in the metric space M, of the minimum number of balls needed to cover b, using balls 

- with half the radius of b. The logarithm of the doubling constant is the doubling dimension of 

r\ \ the space. The doubling dimension can be thought as a generalization of the Euclidean dimension, 

' as H'^ has Q{d) doubling dimension. Furthermore, the doubling dimension extends the notion of 

growth restricted metrics of Karger and Ruhl . 

Understanding the structure of such spaces (or similar notions), and how to manipulate them 
efficiently received considerable attention in the last few years 1313 1231 12H1 IHH OHl • 
The low doubling metric approach can be justified in two levels. 

1. Arguably, non-Euclidean, low (doubling) dimensional metric data appears in practice, and 
deserves an efficient algorithmic treatment. Even high dimensional Euclidean data may have 
some low doubling dimension structure which make it amenable to this approach. 
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This view seems to be shared by many recent algorithmic papers on doubhng metrics, but it 
still awaits a convincing empirical and/or theoretical support. 

2. Even if one is only interested in questions on Euclidean point sets, it makes sense to strip the 
techniques being used to their bare essentials, obtaining better understanding of the problems 
and conceptually simpler solutions. 

More arguments along these lines can be found in jl4j . where the author advocates this approach. 

In general, it is impossible to directly apply algorithmic results developed for fixed dimensional 
Euclidean space to doubling metrics, since there exists doubling metrics that can not embedded 
in Hilbert space with low distortion of the distances |4H I35j . Hence, some of the aforementioned 
works apply notions and techniques from fixed dimensional Computational Geometry and extend 
them to finite metric spaces. 

In particular, Talwar JHI showed that one can extend the notion of well-separated pairs de- 
composition (WSPD) of 11 to spaces with low doubling dimension. Specifically, he shows that 
for every set P of n points having doubling dimension dim, and every e > 0, there exists WSPD, 
with separation 1/e and 0(ne''^^*^™'* log ^) pairs, where dim is the doubling dimension of the finite 
metric space, and $ is the spread of the point set, which is the ratio between the diameter of P 
and the distance between the closest pair of points in P. This is weaker than the result of Callahan 
and Kosaraju fo^' Euclidean space, which does not depend on the spread of the point set. 

Krauthgamer and Lee |34j showed a data structure for answering (1 + e)-approximate near- 
est neighbor queries on point set P with spread Their data structure supports insertions in 
0(log$loglog$) time. The preprocessing time is 0(nlog$loglog$) (this is by inserting the 
points one by one), and the query time is 0(log<I> + e"'^^'^™)). In IR'^ for fixed d, one can answer 
such queries in 0(loglog($/e)) time, using near linear space, see (24] and references therein (in 
fact, it is possible to achieve constant query time using slightly larger storage [2Sj). Note however, 
that the latter results strongly use the Euclidean structure. Recently, Krauthgamer and Lee pS] 
overcame the restriction on the spread, presenting a data-structure with nearly quadratic space, 
and logarithmic query time. 

Underlining all those results, is the notion of hierarchical nets. Intuitively, hierarchical nets are 
sequences of larger and larger subsets of the underlining set P, such that in a given resolution, 
there is a subset in this sequence that represents the structure of P well in this resolution (a formal 
definition is given in Section|2)). Currently, the known algorithms for constructing those nets require 
running time which is quadratic in n. 

An alternative way for constructing those nets is by the clustering algorithm of Gonzalez |2nj . 
The algorithm of Gonzalez computes 2-approximate /c-center clustering by repeatedly picking the 
point furthest away from the current set of centers. Setting k = n, this results in a permutation of 
the points in the metric space. It is easy to verify that by taking different prefixes of this permu- 
tation, one gets hierarchical nets for the metric. However, the running time of Gonzalez algorithm 
in this case is still quadratic. Although, in fixed dimensional Euclidean space the algorithm of 
Gonzalez was improved to 0(n log A;) time by Feder and Greene ,17], and linear time by Har-Peled 
those algorithms require specifying k in advance, and they do not generate the permutation 
of the points, and as such they cannot be used in this case. 

Our results. In this paper, we present improved algorithms for the aforementioned applications, 
having near linear preprocessing time and linear space. We also remove the dependency on the 
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spread. As such, we (almost) match the known results in computational geometry for low dimen- 
sional Euclidean spaces. 

We assume that the input is given via a black box that can compute the distance between 
any two points in the metric space in constant time. Since the matrix of all (2) distances has 
quadratic size, this means that in some sense our algorithms have sublinear running time. This 
is not entirely surprising since subquadratic time algorithms exist for those problems in fixed 
dimensional Euclidean space. Thus, our paper can be interpreted as further strengthening the 
perceived connection between finite spaces of low doubling dimensions and Euclidean space of low 
dimension. Furthermore, we believe that our algorithms for the well-separated pair decomposition 
and approximate nearest neighbor are slightly cleaner and simpler than the previous corresponding 
algorithms for the Euclidean case. 

Net-tree. In Section |31 we present a 2'-^(*^™)nlogn expected time randomized algorithm for con- 
structing the hierarchical nets data-structure, which we call net-tree. 

Approximate Nearest Neighbor (ANN). In Section |3] we show a new data-structure for 
(1 -|- e)-approximate nearest neighbor query. The expected preprocessing time is 20('^™)nlogn, the 
space used is 2'^('^™)n, and the query time is 2*^^'^™) logn + e~^^'^^^\ The quality of approximation 
of the nearest neighbor required is specified together with the query. 

This query time is almost optimal in the oracle model since there are examples of point sets in 
which the query time is 2^^^^^™) logn p4 , and examples in which the query time is ^-^(dim) 1 

Our result also matches the known results of Arya et al. ^ in Euclidean settings. Furthermore, 
our result improves over the recent work of Krauthgamer and Lee, which either assumes bounded 
spread [H^, or requires quadratic space The algorithms in [Tl 1341 15^ are deterministic, in 

contrast to ours. 

Well- Separated Pairs Decomposition (WSPD). In Sectional we show that one can construct 
a e"^ well-separated pairs decomposition of P, in near linear time. The number of pairs is ne~^^^^^\ 
The size of the WSPD is tight as there are examples of metrics in which the size of the WSPD is 
j^^-r2(dim)_ Q^j, j.ggyi^ improves over Talwar's 0^ work, and matches the results of Callahan and 
Kosaraju (these algorithms are deterministic, though). 

Spanners. A t-Spanner of a metric is a sparse weighted graph whose vertices are the metric's 
points, and in which the graph metric is i-approximation to the original metric. Spanners were 
first defined and studied in [lUj. Construction of (1 -|- e)-spanners for points in low dimensional 
Euclidean space is considered in j3H llOj. Using Callahan's technique |T^, the WSPD construction 
also implies a near linear-time construction of (1 -|- e)-spanners having linear number of edges for 
such metrics. Independently of our work, Chan et. al. show a construction of (1 -|- e)-spanner 
for doubling metrics with linear number of edges. Their construction is stronger in the sense that 
the degrees in their spanner graph are bounded by constant. However, they do not specify a bound 
on the running time of their construction. 

^Consider the set with the iaa norm, where m — [e~^/2] , n = [dim] . Consider a query at point q satisfying, 
3x0 & such that d{q,xo) — m — 1, and \/x G ZJ^, x ^ xq => d{q,x) — m. Since xo can be chosen in adversarial 
way any randomized (1 + £)-ANN query algorithm would have to make Q{m") distances queries before hitting xq. 
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Compact Representation Scheme (CRS). In Section El we construct in near linear time, a 
data-structure of linear size that can answer approximate distance queries between pairs of points, 
in essentially constant time. CRS were coined "approximate distance oracles" in j45j . Our result 
extends recent results of Gudmunsson et al. I22j who showed the existence of CRS with similar 
parameters for metrics that are "nearly" fixed-dimensional Euclidean (which are sub-class of fixed 
doubling dimension metrics). We also mention in passing that our CRS technique can be applied 
to improve and unify two recent results |44l I43j on distance labeling. 

Doubling Measure. A doubling measure /i is a measure on the metric space with the property 
that for every x £ P and r > 0, the ratio fi{h{x,2r))/ fi{h{x,r)) is bounded, where b(x,r) = {y : 
d{x,y) < r}. Vol'berg and Konyagin Wl\ proved that for finite metrics (and in fact for complete 
metrics |36p the existence of doubling measure is quantitatively equivalent to the metric being 
doubling. This measure has found some recent algorithmic applications [l^l) and we anticipate 
more applications. Following the proof of Wu [151 ; present in Section [71 a near linear time 
algorithm for constructing a doubling measure. 

Lipschitz Constant of a Mapping. In Section [HI we study the problem of computing the 
Lipschitz constant of a mapping f : P ^ B. In particular, we show how using WSPD it is possible 
to approximate the Lipschitz constant of / in near linear time (in |P|) when P has constant doubling 
dimension (and B is an arbitrary metric). We also obtain efficient exact algorithms, with near linear 
running time, for the case where P is a set of points in one or two dimensional Euclidean space. 

Computing the Doubling Dimension. Although not stated explicitly in the sequel, we assume 
in Section [21 through Section [HI that the doubling dimension of the given metric is either known a 
priori or given as part of the input. This assumption is removed in Section [21 where we remark that 
a constant approximation of the doubling dimension of a given a metric M can be computed in 
20(dim)^ jpg^ time. It is therefore possible to execute the algorithms of Section [2l through Section[Hl 
with the same asymptotic running time, using the approximation of the doubling dimension (In all 
the cases where the doubling dimension is needed, any upper bound on it will do, with accordingly 
degraded running time). 

Most of the algorithms in this paper are randomized. However, our use of randomness is 
confined to Lemma 12.41 (except for Section 18. Ij) . This means that the algorithms always return the 
desired result, with bounds on the expected running time. This also gives the same asymptotic 
bound with constant probability, using Markov inequality. Furthermore, in the ANN and CRS 
schema, randomness is only used in the preprocessing, and the query algorithms are deterministic. 
Lemma 12.41 can be easily derandomized in 0{v?) time, thus giving n^polylog(n) deterministic 
algorithms for all problems discussed here. We do not know whether a non-trivial derandomization 
is possible. 

2 Preliminaries 

Denote by a metric space, and P a finite subset P C M. The spread of P, denoted by ^(P), is 
the ratio between the diameter of P, and the distance between the closest pair of points in P. For a 
point p £ M. and a number r > 0, we denote by b(p, r) = {q £ Ai\ dj^ip, q) ^ the ball of radius 
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r around p. The doubling constant X of P defined as the minimum over m G N such that every 
ball b in P can be covered by at most m balls of at most half the radius. The doubling dimension 
of the metric space is defined as log2 A. A slight variation of the doubling constant is that any 
subset can be covered by A' subsets of at most half the diameter. It is not hard to see that log2 A 
and log2 A' approximate each other up to a factor of 2. Since we will ignore constant factors in the 
dimension, these two definitions are interchangeable. It is clear that log2 A'(P) < log2 \'{A4), thus 
the doubling dimension of P is "approximately" at most that of A4. 

A basic fact about A doubling metric M that will be used repeatedly is that if P C M has 
spread at most then \P\ < AC(iog2*). 



2.1 Hierarchy of Nets 

An r-net in a metric space is a subset M C Ai of points such that sup^^_x4 dMi^i-^) ^ 
and iiifx,yeM; xj^y dM{x, y) > r/a, for some constant a > 1. r-nets are useful "sparse" object that 
approximately capture the geometry of the metric space at scales larger than 3r. In this paper we 
will heavily use the following notion of hierarchical nets. 

Definition 2.1 (Net-tree) Let P C be a finite subset. A net-tree of P is a tree T whose set 
of leaves is P. We denote by P^ C P the set of leaves in the subtree rooted at a vertex v e T. 
With each vertex v associate a point rep„ G P^. Internal vertices have at least two children. Each 
vertex v has a level i{v) € 2 U {— oo}. The levels satisfy i{v) < l{p{v)), where p(f ) is the parent 
of V in r. The levels of the leaves are — oo. Let r be some large enough constant, say r = 11. 
We require the following properties from T: 

Covering property: For every vertex v eT, 

b(rep„:^-/('^)) DP„. 
Packing property: For every non-root vertex v E T, 

b(rep.,20yr^*(^^)-^)n^^^- 

Iniieritance property: For every non-leaf vertex u eT there exists a child v e T of u such 
that repu = rep^. 

The net-tree can be thought of as a representation of nets from all scales in the following sense. 

Proposition 2.2 Given a net-tree, let 

Afc{l) = {rep, \ £{u)<l<mu))]. 

Then the points in Mc{l) are pairwise r'~-^/4 separated; that is, for any p,q E Nc{l) we have 
dM{p,Q) > t'~-'^/4. In addition, P C Upg^p(/)b(p, 4 • r'). 

Proof: Let p,q £ JVc^l), and let u and v be the corresponding nodes in the net-tree, respectively. 
Consider the balls bp = h{p,rp) and b^ = b(g, r^), where rp = 2{t^-i) ' t^^p^")^"^ and = 2{t~-i) ' 
7-^{p(''))-i. The sets bpHP and b^nP are fully contained in P„ and P„ respectively, by the definition 
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of the net-tree. Since u and v are on different branches of the net-tree, P„ and are disjoint. But 
then d_\4{p, q) > max{rp, r^} > 21t"~i) ' ''"'"^ — ''"'""^Z^' the definition of Nc{l) and since r = 11. 

Similarly, consider the set of nodes Vc{l) = £{u) < I < £{p{u)) | realizing A/'c(/). For any 

V G Vc{l), we have Pv C b (^rep^ , • r^^'') ) C b(rep^, 2r7(r - 1)) C b(rep„,T'), since r > 3. 

Thus, P C U^gy^(,)b(rep^,r^) = Upe_v-^(i)b(p, r'), as required. ■ 

Although J\fc{-) are quantitatively weaker nets compared with the greedy approach,^ they are 
stronger in the sense that the packing and the covering properties respect the hierarchical structure 
of the net-tree. 

The packing and covering properties easily imply that each vertex has at most A'^^^^ children. 
Net-trees are roughly equivalent to compressed quadtrees . The Net-tree is also similar to the sb 
data-structure of Clarkson but our analysis and guaranteed performance is new. 

2.2 The Computational Model. 

The model of computation we use is the "unit cost floating-point word RAM model" . More precisely, 
for a given input consisting of poly(n) real numbers at the range [— — <I>"^] U [^^^, and given 
an accuracy parameter i G N, the RAM machine has words of length 0(logn + loglog<I> + t). These 
words can accommodate floating point numbers from the set 

|±(l + x)2?' xe [0,1], x2-* en,ye [-n^W log'^^^) $,n^W log'^W n z} , 

and integers from the set | — (2*nlog<I>)'^(^\ . . . , 0, . . . , (2*nlog $)'^^^)}. For simplicity, we assume 
that the input given in this way is exact. All the problems discussed in this paper have an accuracy 
parameter e > 0. We assume that e'^^^^ > 2~*, to avoid rounding problems. The space used by an 
algorithm (or a scheme) is the number of words being used. The machine allows arithmetic, floor, 
ceiling, conversion from integer to floating point, logarithm and exponent operations in unit time. 
We further assume that the machine is equipped with a random number generator. 

Floating-point computation is a very well studied topic, see |32( Ch. 4] and references therein. 
However, we were unable to trace a citation that explicitly defines an asymptotic floating-point 
computational model. We choose this model for two related reasons: 

1. The algorithms in this paper are supposed to output only approximate solution. Therefore it 
makes sense to try and use approximate numbers since they use less resources. 

2. An important theme in this paper is developing algorithms that are independent of the spread 
of the given metrics. Most algorithms that have an explicit dependence on the spread in their 
time or space complexity, have some form of polylog(<I>) dependence. An algorithm that has 
no dependence on the spread but relies on words of length 0(log<I>), may be considered 
suspicious at best. 

Having said that, for the most part in the sequel we will ignore numerical and accuracy issues 
in our algorithms. The algorithms are simple enough that it is evidently clear that no numerical 
stability issues arise. A notable exception is Assouad's embedding discussed in Section [6.21 There 
we have to explicitly add another ingredient f Lemma 16. 9|) to the algorithm in order to adapt it to 
the floating point word RAM model. Indeed, that section is the catalyst for the current discussion. 

■^We have made no attempt to optimize the ratio between the packing and covering radii, and the one reported 
here can be (substantially) improved. However, some degradation in this ratio seems to be unavoidable. 
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2.3 Finding a Separating Ring 

We next present a simple argument that helps to overcome the dependence on the spread in the 
running time. 

Observation 2.3 Denote by ropt{P,m) the radius of the smallest ball in P (whose center is also 
in P) containing m points. Then in a metric space with doubling constant A, any ball of radius 2r, 
where r < 2ropt(P,m), contains at most points. 

Proof: By the doubling property, the ball of radius 2r can be covered by balls of radius 
ropt(-P, to). Each such ball contains at most m points. ■ 

Lemma 2.4 Given an n-point metric space P with doubling constant \, one can compute a ball 
b = h{p,r), such that b contains at least m = n/(2\^) points of P, and b(p, 2r) contains at most 
n/2 points of P. The expected running time of this algorithm is O(A^n). 

Proof: Pick randomly a point p from P, and compute the ball b(p, r) of smallest radius around 
p containing at least n/{2}?) points. Next, consider the ball of radius b(p, 2r). If it contains < n/2 
points we are done. Otherwise, we repeat this procedure until success. 

To see why this algorithm succeeds with constant probability in each iteration, consider the 
smallest ball Q = P r\ b(g,ropt) that contains at least m points of P. Observe that any ball of 
radius ropt/2 contain less than m points. With probability 1/(2A^) our sample is from Q. If p G Q, 
then r < 2ropt, and by the doubling property the ball b(p, 4ropt) can be covered by at most A^ 
balls of radius ropt/2. Hence it holds that |P n b(p, 2r)| < A^m < n/2. 

Thus, the algorithm succeeds with probability 1/(2A^) in each iteration, and with probability 
> 1/3 after 2A'^ iterations, implying the result, as each iteration takes 0{n) time. ■ 

Lemma 12.41 enable us to find a sparse ring of radius "not much larger" than its width. For 
example, using it we can find an empty ring of width h and radius at most 2nh in linear time. 

3 Computing Nets Efficiently 

In this section we prove the following theorem. 

Theorem 3.1 Given a set P of n points in Ai, one can construct a net-tree for P in 2C'(dim)^ 

log n 

expected time. 

The outline of the proof is as follows. In Section T^.ll we show how to construct Gonzalez sequence 
in 2<^('^''^)n log(n + $) time. We then eliminate the dependence of the running time on the spread 

in Section using a tool developed in Section In Section |23 we conclude the proof of 
Theorem I3.1l bv showing how to construct the net-tree from the Gonzalez sequence. We end with 
mentioning in Section [3.51 few data structures for efficient searching on the net-tree. 

3.1 Computing greedy clustering quickly 

Gonzalez j2nj presented a greedy algorithm, denoted by GreedyCluster, that when applied to a 
set of points P, computes a permutation of the points 11 = {pi,P2, ■ ■ ■ ,Pm), such that pi, . . . ,Pk are 
good centers for P, for any A; > 1. We refer to 11 as the greedy permutation of P. Formally, there 
are numbers ri, . . . , r„, such that P C uf^^h{pi, r^). Furthermore, mini<j<j<fc dj^{pi,pj) = r^^i. 
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GreedyCluster works by picking an arbitrary point in P to be pi, and setting ri to be the 
distance of the furthest point in P to pi. GreedyCluster stores for every point g G P its distance to 
the closest center picked so far; namely, in the beginning of the kth. iteration, for all g G P we have 
ttq = min*|r/ The algorithm sets the A;th center to be pk = argmaXpgpOp (namely, pk 
is the point in P furthest away from the centers picked so far). Clearly, r^-i = Op^. Implemented 
naively, one can compute the first k points pi, . . . ,pk in 0{nk) time. This requires just scanning 
the points k times. In the kth. iteration, updating = min(ag~^, d_\4{Q,Pk-i))^ and computing the 
point with the maximum such value. Thus, this leads to a 2-approximation to fe-center clustering 
in 0{nk) time. 

Feder and Greene improved the running time to 0{n\ogk) time (this was further improved 
to linear time by Har-Peled |2S1)- Feder and Greene's main observation was that when updating 
Qj^g needs to update this value only for points of P which are in distance < r^-i away from 
Pk , since for points q further away, the addition of pk can not change . 

This suggests a natural approach for computing the greedy permutation: Associate with each 
center in {pi, . . . ,pk} the points of P that it serves (namely, points that are closer to the given 
center than to any other center). Furthermore, each center pi, maintains a friends list that contains 
all the centers that are in distance at most Art from it. An "old" center will trim a point from its 
friends list only when it its distance is larger than Sr^ . Specifically, the friends list of pi at the kth. 
iteration [k > i) contains all the centers at distance at most min{8rfc,4rj} from pi. Because of the 
constant doubling dimension property, this list is of size A'^*-^\ 

We further maintains a max-heap in which every center pi, i < k maintains the point p'- furthest 
away from pi in its cluster along with its current Op/ = dMiPi^p't) value. 

At the kth iteration, the algorithm extracts the maximum value from the heap. It sets pk to 
be the corresponding point. Denote by Cp^. the closest point among {pi, . . . ,Pk-i} to pk (i.e., the 
cluster's center of pk at the end of the (/c — l)th round). Next, the algorithm scans all the points 
currently served by the same cluster as Cp^. , or by clusters containing points from friends list of Cp^. , 
and update the a value of those points. Furthermore, it moves all the relevant points to the newly 
formed cluster. In the process, it also update the points p'^ (of maximum distance from pi in its 
cluster) for all pi in the friends list of Cp^ . It also computes friends list of pk (how to exactly do it 
will be described in detail shortly). 

We next bound the running time. To this end, a phase starting at the ith iteration of the 
algorithm terminates at the first j > i such that rj_i < rj_i/2. A ball of radius 4rj-i around 
each point q ^ P contains at most points oi pi, . . . ,pj, and as such every point of P is being 
scanned at most A'^ times at each phase of the algorithm. Thus, if the spread of the point set is 
$, the number of phases is 0(log<I>), and scanning takes A'^'-^-'nlog $ time overall. Maintaining 
the max-heap costs an additional A'^^^^nlogn time, since in each iteration only A*^^^) values in the 
head are changed. 

The only remaining hurdle is the computation of the friends list of a newly formed center pk- 
This can be done by maintaining for every point p;, I E {1, . . . , n}, the serving center pii two phases 
ago (at the end of that phase). The new friends list of pk can be constructed by scanning the 
friends list of pk', and picking those in b(pfc,4rfc). This costs A*^^^^ time for pk and 0{\^^^^n) time 
overall. To see that this search suffices, we should see that the set {pi\i < k, dM{Pi-,Pk) < 4rfc} 
is scanned. Indeed, fix pi^, having io < ^) and dM{PioiPk) < ^r^. Let pk' be the center of pk two 
phases ago. From the definition, 2rk < Vk' < 4rfc, so d_M{Pk,Pk') < ^r^. The current (at the end 
of the (fc — l)th iteration) friends list of pk' contains all the current centers at distance at most 
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min{8rj!c, 4rfc/} = Sr^ frompfc^. Furthermore, 

dM{Pio,Pk') < dMiPio^Pk) + dM{Pk,Pk') < 8rfc. 

we are therefore guaranteed that pig will be scanned. 

Of course, as the algorithm progresses it needs to remove non-relevant elements from the friends 
list as the current clustering radius rj shrinks. However, this can be done in a lazy fashion whenever 
the algorithm scans such a list. 

Theorem 3.2 Let P be a n-point metric space with doubling constant A and spread Then the 
greedy permutation for P can be computed in 0(A'^^^^nlog(<^n)) time, and 0{X'^^^^n) space. 

3.2 Low Quality Approximation by HST 

Here we present an auxiliary tool that will be used in Section r3.3l to extend the net-tree construction 
of Section 123 to metric spaces with large spread. 

We will use the following special type of metric spaces: 

Definition 3.3 Hierarchically well-separated tree (HST) is a metric space defined on the leaves of 
a rooted tree T. With each vertex u € T there is associated a label > such that A„ = if 
and only if n is a leaf of T. The labels are such that if a vertex n is a child of a vertex v then 
Au < A„. The distance between two leaves x and y of T is defined as ^\ca{x,y)j where lca(x,y) is 
the least common ancestor of x and y in T. 

The class of HSTs coincides with the class of finite ultrametrics. For convenience, we will assume 
that the underlying tree is binary (any HST can be converted to binary HST in linear time, while 
retaining the underlying metric). We will also associate with every vertex n S T, an arbitrary leaf 
rep^ of the subtree rooted at u. We also require that rep„ G {i^ep^| u is a child of u}. 

A metric is called ^-approximation of the metric M, if they are on the same set of points, 
and dMiujv) < d]\f{u,v) < t • dM{u,v), for any u,v £ M.. 

It is not hard to see that any n-point metric is (n — l)-approximated by some HST (see, e.g. 
Lemma l3.6p . Here we show: 

Lemma 3.4 For n-point metric space Ai with constant doubling dimension, it is possible to con- 
struct in 0(n log n) expected time an HST which is 3n^ approximation of Ai. 

This low quality HST will help us later in eliminating the dependence on the spread of the 
construction time the net-tree and in distance queries. 

We begin proving Lemma 13.41 by constructing a sparse graph that approximates the original 
metric (this is sometimes called spanner). 

Lemma 3.5 Given an n-point metric space P with doubling constant X, one can compute a weighted 
graph G that "in -approximates P in O(A^nlogn) expected time. The graph G contains O(A^nlogn) 
edges. 

Proof: The construction is recursive. If n = 0(1), we just add all the pairs from P as edges. 
Otherwise, we compute, using Lemma 12.41 a ball b(c, r) containing at least m = n /(2A3) points of 
P with the additional property that b(c, 2r) contains at most n/2 points of P. 
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As such, there exists two numbers r',h such that r < r' < 2r, h > r/n and PCi b(c, r') = 
P n b(c, r' + h) (namely, the ring with outer radius r' -\- h and inner radius r' around c is empty 
of points of P). Computing r' and h is done by computing the distance of each point from c, and 
partitioning the distance range [r, 2r] into 2n equal length segments. In each segment, we register 
the point with minimum and maximum distance from c in this range. This can be easily done 
in 0(n) time using the floor function. Next, scan those buckets from left to right. Clearly, the 
maximum length gap is realized by a maximum of one bucket together with a consecutive non 
empty minimum of another bucket. Thus, the maximum length interval can be computed in linear 
time, and yield r and h. 

Let Pin = b(c,r') n P and let Pout = P \ Pin- Observe that (Pin, Pout) = minpgPj^^ggp^^^ 
dj^{p, q) > h > r /n. Next, we build recursively a spanner for Pin and a spanner for Pout- We then 
add the edges between c and all the points of P to the spanner. Let G denote the resulting graph. 

Since n/2 > |Pin| > n/2A^ points of P, the running time of the algorithm is r(|P|) = r(|Pin|) + 
r(l-foutl) + O(A^n) = O(A^nlogn). Similarly, the number of edges in G is O(A^nlogn). 

We remain with the delightful task of proving that G provides a Sn-approximation to the 
distances of P. Let Gin and Gout be the the graphs computed for Pin and Pout, respectively. 
Consider any two points u,v £ P. If n and v are both in Pin or both in Pout then the claim follows 
by induction. Thus, consider the case that u G Pin and v G Pout- Observe that (ImW, v) > h > r /n. 
On the other hand, 

r/n < dM{u,v) < dc{u,v) < dM{c,u) + dM{c,v) < r + r + dM{u,v) < (2n + l)d_x4{u,v), 

since d_M{c,v) < dM{c,u) + dM{u,v) < r + dM{u,v). Clearly, this implies that dG{u,v) < 
3ndj^{u,v), as claimed. ■ 
We will later obtain in Theorem 15.31 a near linear time construction of spanners that (1 + e)- 
approximate the original metric and have linear number of edges. 

Lemma 3.6 Given a weighted connected graph G on n vertices and m edges, it is possible to 
construct in 0(n log n + m) time an HST H that (n — 1)- approximates the shortest path metric on 
G. 

Proof: Compute the minimum spanning tree of G in 0(n log n + m) time, and let T denote this 
tree. 

Sort the edges of T in non-decreasing order, and add them to the graph one by one. The HST 
is built bottom up. At each point we have a collection of HSTs, each corresponds to a connected 
component of the current graph. When an added edge merges two connected components, we merge 
the two corresponding HSTs into one by adding a new common root for the two HST, and labeling 
this root with the edge's weight times n — 1. This algorithm is only a slight variation on Kruskal 
algorithm, and has the same running time. 

We next estimate the approximation factor. Let x and y be two vertices of G. Denote by e 
the first edge that was added in the process above that made x and y to be in the same connected 
component G. Note that at that point of time e is the heaviest edge in G, so w{e) < dcix^y) < 
(|G| — 1) w{e) < (n — 1) w{e). Since dnix, y) = {n — 1) w{e), we are done. ■ 

The proof Lemma 13.41 now follows by applying Lemma 13.61 on the spanner from Lemma 13.51 

Note that by applying Lemma 13.61 on the spanner from Theorem 15.31 one can obtain a near 
linear time construction of HST which 0(n) approximates that original metric. 
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3.3 Extending greedy clustering to metrics of large spread 

The main idea in removing the dependence of the running time on the spread is to apply the 
algorithm of Section lTTl to a dynamic set of points that will correspond to a level of the HST. In more 
details, the set of points will correspond to the representatives rep^, where < rcurr/n'^ < ^p(f)' 
where rcurr is the current greedy radius, A„ is the HST label of v (i.e. the diameter of subtree 
rooted at v), and p(f ) is the parent of v in the HST. The algorithm now needs to handle another 
type of event, since as the algorithm proceeds, the greedy radius decreases to a level in which 

> ^curr/"-^- In this case v should be replaced by its two children u,w. Specifically, if v belongs 
to a cluster of a point pi, we remove rep^ from the list of points associated with the cluster of 
Pi, and add rep„ and rep^ to this list (the case where pi is equal to rep^ is handled in a similar 
fashion). Next, we need to compute for the new point its nearest center; namely, compute ttrep^ 
and Oirep^ (in fact, since rep„ = rep„ or rep^ = rep^, we need to compute only one of those values). 
To this end, we scan the friend list of pi, and compute arep„ and arep„ from it. This takes A'^^^^ 
time. We also need to insert {rep„,rep^} \ {rep^} into the max-heap. 

Thus, the algorithm has two heaps. One, is a max-heap maintaining the points according to 
their distances to the nearest center, that is for every point p £ P we maintain the values of ap in 
a max-heap. The second max-heap, maintains the nodes of the HST sorted by their diameters A 
(multiplied by a factor of for normalization). At every point, the algorithm extract the larger 
out of two heaps, and handle it accordingly. One important technicality, is that the algorithm is 
no longer generating the same permutation as GreedyCluster, since we are not always picking the 
furthest point to add as the next center. Rather, we add the furthest active point. We refer to the 
new algorithm as NetPermutAlg. 

Lemma 3.7 Let tt = be the permutation of P generated by NetPermutAlg. Fur- 

thermore, let rf^ = ttp^^^ = T^^^i=idM{QjPi)- Then, P C Ui^jh{pi,(l + n^^)rk) and for any 
u,v £ {pi, . . . ,pk} we have d^iu, v) > {1 — n~'^)rk. 

Proof: Clearly, the ball of radius around pi, . . . ,pk cover all the active points when pk+i was 
created. However, every active point might represent points which are in distance r/^/n'^ from it. 
Thus, by expanding the radius by (1 -|- 1/n^), those balls cover all the points. 

Observe, that this implies that for any i < j we have (1 -|- n~'^)ri > rj. In particular, let a < A; 
be the minimum number such that u,v £ {pi, . . . ,Pa}- Clearly, dM{u,v) > ra-i > rk/{l +n~'^) > 
(l-n-2)rfc. ■ 

Lemma 3.8 The expected running time o/ NetPermutAlg is 0{\'-'^^^nlogn). 

Proof: Constructing the HST takes A'^'^^^nlogn expected time, using Lemma \'AA\ As in the 
bounded spread case, we conceptually divide the execution of the algorithm into phases. In the ith 
phase, the algorithm handles new clusters with radii in the range diam(P)/2*-i and diam(P)/2\ 
Consider a point p £ P: It is being inserted into the point-set when a node v in the HST is being 
"split" at phase i (since p is the representative point for one of the children of v). Let p and q be 
the two representative points of the two children of v. We charge v for any work done with p and 
q for the next L = 10 log n phases. Consider any work done on p before it undergoes another split 
event. If p is at most L phases away from the split event of v, the vertex v pays for it. 

Otherwise, consider p at > L phases away from its latest split event that happened at v. Let 
Tcurr be the current clustering radius, and observe that p represents a set of points which has a 
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diameter < rcurr/n^ and rcurr < A^/n^'^. In particular, this implies that P H b(p, rcurr • n^) C 
P n b(p, At,/n^) C P n b(p, Tcurr/^^)- Namely, all the points that p represents are very far from 
the rest of the points of P, in terms of rcurr- In particular, it can not be that the cluster that p 
represents is in any updated friends list in the current stage. (It can be in a friends list that was 
not updated lately, since we use lazy evaluation. However, when this friends list will be used, it 
will be updated and p will disappear from it. Note that the work put on updating the friends lists 
is A*^*^^^?! overall, see Section 13.11 1 Thus p does not require any work from the algorithm till it 
undergoes another split event. 

Thus, every node in the HST is charged with A^-^^^^logn work. It follows, that the overall 
running time of the algorithm A'^^^^nlogn. ■ 



3.4 Constructing the Net-tree 

In this section we conclude the description of the algorithm for constructing the net-tree, and prove 
Theorem 13.11 

The construction of the net-tree T is done by adding points of P according to the NetPermut- 
Alg's permutation. As mentioned before, the construction algorithm and the resulting tree is 
similar to the data-structure of Clarkson (our analysis and the guaranteed performance are 
new, however). The tree constructed ioi pi, . . . ,pk is denoted by T^^\ and T = T^"-\ We obtain 
T(^) from pC'^i) as follows. 

During the construction, we maintain for every vertex u G T^'^^ a set of vertices Rel(n), which 
are the vertices close by. Namely, the set Rel(ii) would be in fact the set 

Rd(u) = r(^') £{v) < i{u) < ^(p(v)), and dx(rep„,repj < 13 • r^(") } , 

where r is the packing constant associated with the Net-tree, see Definition l2.1l (Since we compute 
Rel(u) indirectly, the fact that Rel(ii) = Rel(ti) requires a formal proof, see Lemma 13.91 (v)-) The 
set Rel(n) is of size A*^^^^ throughout the algorithm's execution. 

We denote by r j = min | r j 1 < j < « | . 

The Algorithm. The kih point in the permutation, pk, will be added as a leaf to the tree T^^~^^ 
to form the tree T^''\ As such we fix i{pk) = — oo, and rep^^ = pk- Let I = [log^rfc_i]. 

Let h be the largest index such that log^Yh-i > I (i.e., ph is the last added center in the previous 
phase). Let q £ {pi, . . . ,ph} the closest point to pk among {pi, . . . ,ph}; namely, q is the nearest 
neighbor to pk in all the centers present in the previous phase. Identifying q with the unique leaf 
of r(^^^) whose representative is q, let u = p{q). We obtain T^^'^ as follows. 

(a) If £{u) > I, then we make a new vertex v, set i{v) = I and rep^ = q. We then connect q and p^ 
as children of v, and make v a child of u. 

(b) Otherwise, connect pk as another child of u. 

Finding q. Let Cp^. be the closest point among {pi, . . -Pk-i} to pk (this information is computed 
by NetPermutAlg, see Section l3?T] for details). Denote u = p(cp^). We consider two cases: 

(1) If ^(S) > then q = u, see Lemma EH (0) for a proof. 
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(2) Otherwise, i{u) = I. In this case, q must be in the set |rep^„ w £ Rel(n) |, see Lemma (0) 
for a proof. So, we just pick q to be the nearest neighbor to pk in |rep^ w £ Rel(u) |. 

Updating Rel(-). For each new vertex added x we do the following: Let y = p{x). For each 
z £ Rel(y), and for each child z' of z we traverse part of the tree rooted at z' in the following way: 
When visiting a vertex u, we check whether u should be added to Rel(x) and whether x should be 
added to Rel(M) according to the Rel(-) definition, and update Rel(x) and Rel(ti) accordingly. If 
X has been added to Rel(ii) then we continue by traversing the children of u. Otherwise, we skip 
them. 

Note, that this might require scanning a large fraction of the net-tree, as x might appear in a 
large number of Rel() lists. 

Lemma 3.9 For any k £ [1, . . . ,n], the tree T^^^ has the following properties, 
(i) The part of the algorithm that finds q, indeed finds it. 
(a) If V is a child of u, then d_/v4(rep„,rep^) < 2 • t^^^\ 

(Hi) For every t £ H, every pair of points in Mc{t) is at least r*~^ far apart, 
(iv) T^^^ is a net-tree of {pi, . . . ,pk}. 
(v) For any u £T, Rel(ii) = Rel(n). 

Since the proof of Lemma 13.91 is tedious, we defer it to Appendix ^ We next analyze the 
running time. 

Lemma 3.10 Given the (approximate) greedy permutation {pi, . . . ,pn) with their "current" clus- 
ter's center {cp^, . . . ,Cp^), the algorithm for constructing the net-tree runs in n time. 

Proof: By the definition of Rel(-), the size of each such list is at most A*^*^^-*. Assuming the tree 
is implemented reasonably (with pointers from a vertex to its children and parent), the part of 
constructing the tree clearly takes 0{\'-'^^^) time per a new point. 

Next we estimate the time to construct Rel(-). For each vertex inserted x, we first charge A*-^^^^ 
visits for visiting the children of Rel(p(3;)). All the other visits are charged to the parent of the 
visited vertex. Each vertex has at most A*^^^^ children, and its children are visited only if a new 
entry was inserted to its Rel(). As the total size of the Rel(-) lists is at most A'^^^^n, we have just 
bounded the number of visits of vertices during the update process of Rel(-) to A'^(i)n. Thus the 
time spent is n. ■ 



3.5 Augmenting the Net-tree 

In order to efficiently search on the net-tree, we will need the following three auxiliary data struc- 
tures. 

The first one allows, given a vertex v of level I, to find all the the vertices of "roughly the 
same level" which are nearby; i.e., whose representative is at distance at most O(t') from the 
representative of v. More accurately, we need a fast access to Rel(f), as defined in Section 123 We 
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have seen in that section how to construct it in near hnear time such that the whole hst can be 
accessed in O(A^) time. 

The second data-structure enables the following seek operation: Given a leaf x, and a level 
/, find the ancestor y of x such that ^(p(y)) > I > ^(y)- Bender and Farach-Colton |5] present a 
data-structure that can be constructed in linear time over a tree T, such that given a node x, and 
depth d, it outputs the ancestor of x at depth d at x. This takes constant time per query. Thus, 
performing the seek operation just requires performing a binary search using T> over the net-tree, 
and this takes O(logn) time. 

Our third data-structure supports a restricted version of the above seek operation: Given a leaf 
X, an ancestor z of x, and given a level /: If Z ^ i^i^) — clog n, i{z)] return "don't know" . Otherwise, 
return an ancestor y of x satisfying £{p{y)) > I > i{y) (here c > is an absolute constant). The 
data structure has 0{n) space, 0(n log n) preprocessing time, and the queries can be answered in 
constant time. 

As a first step, observe that if for every internal vertex z and a descendant leaf x we add vertices 
to the tree so as to fill all levels between l{z) and l{x) — clogn on the path between z and x, then 
queries to I level ancestor, / G [^(a;) — clogn, (.{z)] can be answered by using the data structure V 
as above to find an ancestor of x at depth d{z) — {i{z) — I). This construction, however, may blow 
up the number of vertices in the net-tree (and hence the space) by a log n factor. 

To obtain linear space we do the following: In the preprocessing step we enumerate all the 
possible patterns of existence/nonexistence of vertices in 0.51og2n consecutive levels. For each 
given pattern, and each given level in the pattern we write the number of actual vertices above 
this level. Preparing this enumeration takes only 0{^/nlogn) time. Now, for each vertex u of the 
net-tree, we hold 2c pointers to such patterns that together map the vertices in the clogn level 
below V on the path to u, where v is an ancestor of u at depth d{u) — clogn, if such v exists (note 
that V is clogn edges above u in the net-tree, but u holds the pattern of only the first clogn levels 
below v). This data structure can be clearly computed in 0(n log n) time using top-down dynamic 
programming on the net-tree. 

Given a query (with x, z and / as above), we do as follows: Let u be an ancestor of x at depth 
max{d(z) + clogn, d{x)}. Vertex u can be accessed in 0(1) time using the data-structure P. Using 
the patterns pointed by u we can find the depth of the relevant vertex whose level is just below / 
in 0(1) time, and now using P again we can access this vertex in constant time. 

4 Approximate Nearest-Neighbor Search 

In the following, ANN stands for approximate nearest neighbor. In this section, we present an 
approximate nearest neighbor (ANN) scheme, that for a given set of points P, preprocess it in near 
linear time, and produce a linear space data-structure which answers queries of the form "given 
point q, find p £ P, such that d{q,p) < (1 -|- e)d{q, P)" in logarithmic time. See Section ^ for more 
details. 

In Section r4.11 we present a variant of Krauthgamer and Lee pil net navigation ANN algorithm 
that works on the net-tree. This algorithm allows to boost an yl-ANN solution to (1 + e)-ANN 
solution in 0(log n + log(A/e)) query time. In Section [4. 21 we present a fast construction of a variant 
of the ring separator tree |29l I33j . which support fast 2n-ANN queries. We conclude in Section f4. 31 
with the general scheme which is a combination of the previous two. 
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4.1 The Low Spread Case 

Lemma 4.1 Given a net-tree T of P, a query point q £ M, and vertex u €z T at level I = i{u) 
such that (i;\4(rep„, g) < 5 • r' or p £ P^, where p is the nearest neighbor to q in P. Then there 
is an algorithm that traverse T form u downward, such that for any t G N, after t + 4 steps, the 
algorithm reaches a vertex s for which rep^, is a {1 + t^~^~'^)-ANN, where f = \og^dj^{p,q). The 
running time of this search is \^^^^ min{t, / — /} + \^^™'^^^^~'y^~f^'^^\ 

Proof: The query algorithm works as follows. It constructs sets Ai of vertices in T with the following 
properties: 

1. For each v £ Ai, ^(p('u)) > i > i{v). 

2. p£ U^eA,P„Ch{q,dMiq,p) + {'^3 + T^)-r'). 

The algorithm starts by setting Ai = Rel{u). lip £ Pu then Ai clearly satisfies the two properties 
above. If d;\4(rep„,g) < 5 • r^, then d_A4(i'ep„,p) < WtK Suppose for the sake of contradiction that 
p ^ UvsAiPv, then 3v' such that £{v') < I, (i;vi(rep„,rep„/) > 13t', and p G Pyi. But then from the 
covering property (ix(rep„/,p) < :;^t' which means that dx(rep„,p) > (13 — -f^)^'' > lOr', a 
contradiction. 

The set Ai-i is constructed from Ai as follows: Let v G Aihe the closest vertex in Ai to q, i.e., 
(i_A/((rep„, g) = min^g^. (i_A4 (rep^ , g) . Let B the set obtained from Ai by replacing every vertex of 
level i with its children. The set j4j_i is obtained from B by throwing out any vertex w for which 
rep^) > rep^) + ■ r*~^. It is easily checked that Ai-i has the required properties. 

The running time is clearly dominated by A*^^^^ times the sum of the Ai's sizes. For i > f, 
dji^{q,iej)^) is at most ■ r*, and therefore \Ai\ < A*^^^^. For i < f, we have only a weak bound 
of I Ail < X^if-'). Thus the running time of the algorithm for t steps follows. Notice that any point 
in Ai_i is (1 + t'-/-*+4)-ANN. ■ 

For a set P with spread ^, by applying the algorithm of Lemma [4.1l with u the root of T, and t = 
\log^{^ / e) — /], Lemma l4. II gives a (1 + e)-approximate nearest neighbor scheme with O(nlogn) 
expected construction time and 0(log$ + e~*^('^™)) query time. (Note that the algorithm does not 
need to know t (and thus /) in advance, it can estimate the current approximation by comparing 
dj^{q,i[ep^) to r*.) This gives an alternative to the data-structure of Krauthgamer and Lee pij . 
with a slightly faster construction time. Their construction time is 0(n log $ log log $) if one uses 
the insertion operation for their data-structure (note that in the constant doubling dimension 
setting, logn = 0(log <!>)). In fact, in this case, the Rel() data-structure is not needed since 
Rel(root) = {root}. Therefore the storage for this ANN scheme is 0{n), with no dependency on the 
dimension. A similar construction was obtained independently in 0. However, their construction 
time is O(n^). 

4.2 Low Quality Ring Separator Tree 

Lemma 4.2 One can construct a data- structure which supports 2n-ANN queries in 2*-'('^'™) logn 
time. The construction time is 2'^('^™^n log n, and the data- structure uses space. 

Proof: The data structure is a binary search tree S, in which each vertex of the tree v is associated 
with a point p^ G P and radius r^. We are guaranteed that n/2A^ < |b(pt,, r„)| < (l-l/2A3)n, and 
that {h{pv, (1 + l/2n)r„) \ b(p^, (1 — \/2n)ry)) f\P = The left subtree is recursively constructed 
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on the set Pnb(p^, r„), and the right subtree is recursively constructed on P\b(p„, r^,). The depth 
of S is clearly at most 0(A^ logn). 

The construction of S is similar to the construction of the low-quality spanner (Section l3.2|) and 
uses Lemma 12.41 as follows. Apply Lemma 12.41 to find p £ P and r such that |b(p, r)| > n/(2A^), 
whereas |b(p, 2r)| < n/2. From the pigeon-hole principle, there exists r' € [(1 + l/2n)r, 2r — r/2n) 
for which h{p, (1 -|- l/2n)r') \ h{p, (1 — l/2n)r') = 0. We now make a root v for the ring separator 
tree, set py = p, and r„ = r', and recurse on b(p^,, r„) as the left subtree, and P\h{py,ry) as the right 
subtree. The construction time T(n) obeys the recursive formula T(n) = T{ni) + T{n2) + 0{n), 
where rii + n2 = n, n/2X^ < ?^l < n/2. 

Once we have this data-structure, 2n-ANN can be found in O(A^logn) time as follows. Let 
the root of the ring separator tree be u. Given a query point q, check its distance to pu- If 
dMiQ^Pu) < then recurse on the left subtree. Otherwise, recurse on the right subtree. At the 
end, return the nearest point to q among where v is on the path traversed by the algorithm. 

The running time of this procedure is clearly dominated by the height of the tree which is 
0(A3 logn). 

To see that this is indeed 2n-ANN, let a be the vertical path in the tree traversed by the 
algorithm, and let b be the vertical path in the tree connecting the root to the nearest neighbor 
of q in P. Let v be the lowest common vertex of a and b. Suppose that a continued on the left 
subtree of v while b continued on the right subtree. In this case the distance from q to the nearest 
neighbor is at least ry/2n, while dM{Pv,Q) ^ fv Thus p^ is 2n-ANN. 

If a continued on the right subtree of v while b continued on the left subtree of v, then The 
distance from the nearest neighbor is at least r^/2n -|- idMiPv,Q) — r^), while p^ is at distance 
(i_A/( (pt, , g) . The ratio between this two quantities is clearly at most 2n. ■ 

Remark 4.3 As is pointed out in |29| I33j . it is possible to duplicate points in the ring for the two 
subtrees. Hence we can actually partition the b(p, 2r) \ b(p, r) into t < n sub rings, and choose to 
duplicate a "light" ring. When t = 1, we obtain the Ring Separator Tree from [SH], that supports 
0(1)-ANN queries, but requires n-^'^*'^'™' storage. For general t < n we obtain a data structure that 
supports 0(t)-ANN queries, and by choosing the right ring to duplicate, consumes only n*^^'"^^^-*^ 
storage. To see this, we set (3 = (31og2A)i/* and prove by induction on n that it is possible to find 
a ring such that the number of leaves in the tree is at most n^. Denote by r/,, = \h{p, (1 + i/t)r)\/n. 
Note that (2A)~^ ^ Vo ^ Vi — ' ' ' Vt ^ "-Z^, and therefore there exists i < t for which rji-i > rj^ , 
otherwise (2A)"^ < r/f < (1/2)^* which is a contradiction. Thus by duplicating the ith ring, and 
by applying the inductive hypothesis on the number of leaves in the subtrees, the resulting tree 
will have at most {rjiu)^ + ((1 — ?7j_i)n)^ < + (1 — ??j_i))n^ leaves. 

Thus, setting t = 0(loglog A • logn), we obtain a linear space ring separator tree that supports 
0(t)-ANN queries in O(logn) time. 

4.3 ANN algorithm for arbitrary spread 

The algorithm for arbitrary spread is now pretty clear. During the preprocessing we construct the 
augmented net-tree from Section |31 We also construct the low quality ring separator tree. The 
construction time is 2*^ ('^^^n logn, and the space used is 2'^(<^™) n. 

Given a query point q £ Ai, and the approximation parameter e > 0, the query algorithm 
consists of three steps: 
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1. First, find 2n-ANN pi using the low quality ring separator tree of Section [4.21 

2. Next find a vertex u in tlie net-tree wliich is an ancestor for pi and tliat satisfies 

m^)) - 1 > riog,(i6 • dMiPi,qm > (i^)- 

Hence 

dA,(rep,, q) < dAi(rep„,pi) + dM{pi,q) < 2.5 • + ^r^(P("))-i. 

3. We now split the analysis into two cases. 

(a) If 2.5 • T^(") > ^r^(p("))~-^, then clearly dx(rep^,g) < 5r^("), and thus u satisfies the 
conditions of Lemma l4. II 

(b) If on the other hand 2.5 • r^(") < ^^r^^P^"))"-^, then the packing property of the net-tree 
implies that 

Pnb(g,dA4(g,rep,J) CPnb(rep„,2a!^(g,rep„)) CPnb(rep„,i-r^(P("))-i) CPn, 
and therefore p £ Pu- Thus, in this case u also satisfies the conditions of Lemma l4. II 

4. Set / = l{u). Using the notation of Lemma l4.ll the fact that pi is a 2n-ANN, implies that 
/ > / — (1 -|- logn), thus by setting the number of steps to t = [log(n/e)], and applying the 
algorithm of Lemma l4. II we obtain (1 -|- e)-ANN. 

The running time of the query is 

A^(^) logn + O(logn) + A^(^) logn + e-O(dim) < ^O(i) ^^g^ ^ ^-O(dim)^ 

We summarize: 

Theorem 4.4 Given a set P of n points of hounded doubling dimension dim in a metric space M, 
One can construct a data-structure for answering approximate nearest neighbor queries (where the 
quality parameter e is provided together with the query). The query time is 2'^('^'™) logn-l-e"'-^^'^™), 
the expected preprocessing time is 2'^(<^™)nlogn, and the space used is 2'^('i''^)n. 

Theorem 14.41 compares quite favorably with the result of Krauthgamer and Lee j33j , which 
solves the same problem with the same (tight) query time but using 0(2*^*^ '^™^n^ polylog(n)) space. 

5 Fast construction of WSPD and Spanners 

Let P be an n-point subset of a metric space M with doubling dimension dim, and 1/4 > e > a 
parameter. Denote hy A<^ B the set y} \ x £ A, y £ . A well-separated pair decomposition 
(WSPD) with parameter of P is a set of pairs {{^i, Pi} , • • • , {Ag, Ps}}, such that 

1. Ai, Pj C P for every i. 

2. n Pj = for every i. 

3. U|=i^, Pi = P P. 
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4. dMi^iT Bi) > e ^ • max {diam(Aj), diam(i?j)} 

The notion of WSPD was defined by Callahan and Kosaraju for Euclidean spaces. Talwar 
j44j have shown that this notion transfer to constant doubling metrics. In particular, he proves 
that any n-point metric with doubling dimension dim admits WSPD in which the number of pairs 
is ne~^^^^^^ log$. We improve this result. 

Lemma 5.1 For 1 > e > 0, one can construct a e~"^-WSPD of size ne~^^'^^™\ and the expected 
construction time is 2'^('^'™)n log n + ne"'^*^'^'™) . 

Furthermore, the pairs of the WSPD correspond to {Pu, Pv), where u, v are vertices of a net-tree 
of P, and for any pair (P^, Pv) in WSPD, diam(P„), diam(P„) < e(ip(rep„, rep^). 

Proof: We compute the net-tree T using Theorem For concreteness of the WSPD, assume 
also that some weak linear order < is defined on the vertices of T. The WSPD is constructed by 
calling to genWSPD(iio, lio), where uq is the root of the net-tree T, and genWSPD(n, f ) is defined 
recursively as follows. 

genWSPD(u,t;) 

Assume i{u) > i{v) or ( i{u) = £{v) and u ^ v) 

(otherwise exchange u ^ v). 
If 8^ • r^(") < e ■ dAl(rep„,repJ then 

return | {n, v} } 
else 

Denote by ui , . . . , the children of u 
return U[=i genWSPD(uj, v). 

For any node u £ T we have diam(P„) < 2:^^ ■ r^(") (see Definition 12. 1|) . In particular, for 
every output pair {u, v} it holds 

max{diam(P„),diam(P^)} < 2^ • max|r^("),r^(^)} < |dp(rep„,repj 

< |(dp(P„, Pv) + diam(P„) + diam(P^)), 

and so max{diam(P„), diam(P„)} < j(jz^j^dp{Pu, Pv) < £dp(Pu, Pv), since e < 1. Similarly, for 
any x G Pu and y £ Pv, we have 

dp{repu,rep^) < dp{x,y) + diam(Pu) + diam(Pt,) < (1 + e)dp{x,y). 

One can verify that every pair of points is covered by a pair of subsets {Pu,Pv} output by the 
genWSPD algorithm. 

We are left to argue about the size of the output (the running time is clearly linear in the 
output size). Let {u,v} be an output pair and assume that the call to genWSPD('u, w) was issued 
by genWSPD(u, p(t')). We charge this call to p(f), and we will prove that each vertex is charged 

at most £^0{dim) ^ij^gg^ 

Fix v' G T. It is charged by pairs of the form {u, v} in which p(f ) = v' , and which were issued 
inside genWSPD(u, ?;'). This implies that £(p(u)) > iiv') > i{u). 
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Since the pair (n, v') was not generated by genWSPD, it must be that conclude that dp(rep^/, rep„) < 
g_2r_ .^£K)/e. The set 



U = <w 



iiViw)) > i{v') > e{w) and dp{rep^, ,TepJ < 8 



2r 



ST-l) 



contains u, and [/ is a subset of J\fc{(-{v'))- By Proposition 12 .21 for every ui,U2 G U, if ui / U2 
then dp{Pui, Pu2) > 'T^^^ )~^/4. By the doubhng property, we have \U\ < e-0(dim)_ therefore 
infer that v' can only be charged by pairs in [/ x C^', where C^' is the set of children of v'. We 
conclude that v' might be charged at most \U\ ■ |Ct,'| < (2/e)*^('^'™) = £-0(dim) ^jj^^gg^ Thus, the 
total number of pairs generated by the algorithm is ns-Oidim) _ _ 



5.1 Spanners 

Definition 5.2 A t-spanner of a finite metric space P is a weighted graph G whose vertices are 
the points of P, and for any x,y £ P, 

dp{x, y) < dcix, y) <t- dp{x, y), 

where do the metric of the shortest path on G. 



Theorem 5.3 Given an n-point metric P with doubling dimension dim, and parameter 1 > e > 0, 
one can compute a {l + e)-spanner of P with ne^^^^™^'^ edges, in 2'-^('^™^n log n + ne"*^'-'^'™) expected 
time. 

Proof: Let c > 16 be an arbitrary constant, and set 5 = e/c. Compute a decomposition 
using the algorithm of the previous section. For every pair {it,v} G WSPD, add an edge between 
{rep„, rep^} with weight dp(rep„, rep^). Let G be the resulting graph, clearly, the resulting shortest 
path metric do dominates the metric dp. 

The upper bound on the stretch is proved by induction on the length of pairs in the WSPD. 
Fix a pair x,y G P, by our induction hypothesis, we have for every pair z,w € P such that 
dp{z, w) < dp{x, y), it holds dc{z, w) < {1 + c5)dp{z, w). 

The pair x,y must appear in some pairjn, G WSPD, where x G Pu, and y G Pv. Thus 
(ip(rep„,rep^) < (1 + 2d)dp{x,y) and (ip(x, rep„), rep„) < 5(i_/vi(i'ep„,rep^), by Lemma 

By the inductive hypothesis 

dcix, y) < dcix, rep„) + dcirep^, rep„) + dG{Tep^,y) 

< {l + c6)dp{x, rep„) + dp(rep„, rep„) + (1 + cd)dp{vep^,y) 

< 2(1 + c5) - 5 ■ dp(rep„,rep^) + dp(rep„, rep„) 

< {l + 25 + 2c6^){l + 25)dp{x,y) 

< {l + e)dp{x,y), 

since 5c < e < I and 165 < 1 and c > 11. ■ 



19 



6 Compact Representation Scheme 



A compact representation scheme (CRS) of a finite metric space P is a "compact" data-structure 
that can answer distance queries for pairs of points. We measure the performance of a CRS using 
four parameters (P, S, Q, k ), where P is the preprocessing time of the distance matrix, S is the space 
used by the CRS (in terms of words), Q is the query time, and k is the approximation factor. 

The distance matrix by itself is a (P = 0(1),S = 0{n'^),Q = 0(1), k = 1)-CRS. The e"^- 
WSPD, as well as the (1 + e)-spanner are representations of (1 + 0(e))-approximation of the metric 
that consumes only g-Oidim)^ space. However, naively it takes ^}{n) time to answer approximate 
distance queries in these data-structures. 

In this section, we obtain the following theorem. 

Theorem 6.1 For any n point metric with doubling dimension dim, there exist: 
(a) (P = 2^('i''^)nlog2n + e-^(^™)n,S = e-^('i™)n,Q = 2°^'^''^\K = l + e)-CRS. 
(h) (P = 2^('i'>^) • poly(n) + e-^^'^'^^n, S = e-^^'^'^^n, Q = 0(dim),-K = 1 + e)-CRS. 

For general n-point metrics, Thorup and Zwick obtained a {kn^~^^/^ , kn^~^^^^ , 0{k), 2k — 1)- 
CRS, where A: G N is a prescribed parameter. The trade-off between the approximation and 
the space is essentially tight for general metrics. Closer in spirit to our setting, Gudmunsson et al. 
j2HI22j considered metrics that are t approximated by Euclidean distances in IR'^, where both d and 
t are (possibly large) constants. They showed that such metrics have (0(n log n), 0(n), 0(1), 1 + e)- 
CRS (The O notation here hides constants that depend on e, d and t). Our scheme strictly extends^ 
their result since metrics that are t approximated by a set of points in the d-dimensional Euclidean 
space has doubling dimension at most d\og{2t). We further discuss previous work on special type 
of CRS, called distance labeling, in Section lOl 

Our scheme is naturally composed of two parts: In Section [6. II we show how using the net-tree it 
is possible to convert an ^-approximate CRS into (1 -|- e)-approximate CRS in essentially O(log^) 
query time (and even O(loglogA) query time). We then show in Section [6.21 how to obtain 0(1)- 
approximate CRS using Assouad's embedding. In Section lH^ we observe that Assouad's embedding 
can be used in distance labeling schema. 

6.1 Approximation Boosting Lemma 

Assume we are given a data structure A, which is (P, S, Q, k)-CRS of a set P C M., where 7t < 3n^. 
In this section, we derive a CRS with improved approximation. Besides storing the data-structure 
of A, we also need the following data structures: 

1. The net-tree T augmented so that it supports the following operations: 

(a) O(logn) time access for ancestors of given level as defined in Section [1151 

(b) Constant time access for ancestor of given x, when the level is at most 61ogn levels 
below a given ancestor z. Again, Section [3.51 contains more information. 

(c) A constant time access for the lea of two vertices in T [^. 
^Caveat: They use a weaker model of computation. 
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2. A e~^-WSPD W on the net-tree T, with support for fast membership queries. For each pair 
we also store the distance between their representatives. Using hashing membership queries 
can be answered in constant time. 

3. The (3n^)-approximation HST H of Section f3. 21 The HST H should be augmented with the 
following features: 

(a) A constant time access to least-common-ancestor queries, after a linear time preprocess- 
ing m. 

(b) Each vertex u of H contains pointers to the following set of vertices in T 

Ku = {x £ T : dA^(rep^,rep„) < 4A„ and i{x) < log A„ < £(p(x))} . 

Note that \Ku\ < X'^^^\ and computing all these sets can be accomplished in X^^^^n log n- 
time by finding the level [logA^j] ancestor z of rep^ in T in O(logn) time, and then 
scanning Rel(z). 

All these data-structures can be created in 2'^('^™)n log n + £-0{dim)^ time and £-0(dim)^ space. 

Assuming Query-^(x,y) returns a value r/, such that dM{x-,y) /T^ < V ^ d_\4ix,y), the query 
algorithm is: 



Query-;B(x, y £ P) 




z ^ lca/f(x,y). 




u' <— ancestor of x in T among Kz, v' 


^ ancestor of y in T among Kz- 


7] <— Query-^(x, y). 




no <— ancestor of x in level [log(er/)J , 


vq <— ancestor of y in level [log(e?7)J . 


n <— uo, V ^ vq. 




while {n, v} ^ W do 




a m^)) < mv)) OT { m^)) = 


£{p{v)) and p{v) ^ p(u) ) then 


u ^ p(n) 




else 




V <— p{v). 




return (i_A4 (rep„, rep^). 





Implementation details: u' is found by scanning all vertices in Kz (there are only A*^*-^-* such 
vertices), and checking which one of them is an ancestor of a; in T (ancestorship can be checked using 
the lea operation on T). Note that an ancestor of x must be contained in Kz, since dj\4{iepz: x) < 
Az, and thus the ancestor of level immediately below log must be in Kz- Similar thing happens 
with v'. Both T] and A^ are 3n^ approximation to d_M{x,y) and therefore £{u') — £{uq) < 41ogn-|-3, 
hence uq can be accessed in constant time. The same goes to vq. 

The following lemma is immediate consequence of the way the WSPD algorithm works. 

Lemma 6.2 For a pair {s,t} G W (the e~^-WSPD), and £{s) < £{t), one of the following condi- 
tions must he satisfied: 

1. l{s) <i{t) <^(p(s)) and:^-T'^(p(^)) > e-d^(repp(,),repi), and:^-r^W < e-d^^Crep^, repj. 
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2. i{s) < l{t) = £(p(s)), and p(s) < t, and ^ • r^(P(^)) > e ■ (iA4(repp(,), rep^). ^ • < 
e ■ d^(rep^,repi). 

Proposition 6.3 The while loop finds a pair in W after Oilog'R) steps. 

Proof: Denote by {uq, vq} the pair with which loop begin with. It is straightforward to see that the 
loop climb through all ancestor pairs {n, of {uq,vq} that satisfy either (i) i{u) < £{v) < £{p{u)), 
or (ii) £{u) < i{v) = i{p{u)) and p{u) ^ v. 

Thus, if exists an ancestor pair in W, it will be found by the loop. As we argue in Lemma l5.ll 
there exists an ancestor pair {n, v} of {x, y} in W. Our choice {uq, vq} ensures that uq is descendant 
of u at most 0(log7t) levels down T, and the same goes for vq and v. ■ 

Combining the above claims, implies the following: 

Lemma 6.4 Let P be a n-point metric. Assume we are given a {P,S,Q,'K)-CRS A of a set P, 
where K< 2,n^ . Then, one can obtain {?' ,S' ,Q\l + e)-CRS B ofP, where P' = P + 2*^(^^)71 log n + 
gO(dim)^^ S' = S + e-O(dim)^^ Q' = Q + 0(log k). 

Remark 6.5 The dependence of the query time on k can be improved from 0(log k) to 0(log log k) 
without sacrificing any other parameter. The idea is to replace the "ladder climbing" in the al- 
gorithm above (the while loop) with a binary search on the log/? levels. To do so we change the 
WSPD procedure to output all pairs it encounters. This clearly does not change asymptotically 
the size of W. We do a binary search on the logK relevant levels to find the lowest level pairs which 
still appear in the WSPD, and this gives as the relevant pairs. We do not pursue this improvement 
rigorously, since in the CRS we develop in the next section, the query time Q dominates k anyway, 
and thus this would lead to no asymptotic saving in the query time. 

6.2 Assouad Embedding 

To obtain a constant approximation of the distance quickly, we will use a theorem due to Assouad |2] 
(see also [271 123| ). The following is a variant of the original statement, tailored for our needs, and 
its proof is provided for the sake of completeness. 

Theorem 6.6 Any metric space Ai with doubling dimension dim, can be embedded in i'^, where 
d < £-C>(dim)^ such that the metric {M, yJdjwCj is distorted by 1 + e factor. 

Proof: Fix r > 0, we begin by constructing an embedding (j)^'^^ : M — > IR'^^, where di = 
with the following properties: For every G M: 

1. - <t'^''\y)\\^ < min{r,dA4(x,y)}. 

2. lidM{x,y) e [(l + e)r,2r) then ||(/)W(j;) - (/)W(y)||^ > (1 - e)r. 

We take an er-net AA^^^^ of A4 and color it such that every pair x,y £ M^"^^ for which dj^i^, y) < 
4r, is colored differently. Clearly, di = e-'^C^™) colors suffices. Associate with every color i a 
coordinate, and define for x G 7W, 4>f\x) = max{0, r — (i^(x,Q)}, where Cj C M^^'^ is the set of 
points of color i. 
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We next check that the two properties above are satisfied. As 4)[\x) G [0,r], it is clear that 

~ ^ for every color i. The 1-Lipschitz property easily follows from the triangle 

inequality. 

Next, assume that dM{x,y) G [(1 + e)r, 2r]. Since AA^'")) < er, there exists a color i 

for which dj^{x,Ci) < er. This implies (by the triangle inequality) that d_\4{y,Ci) > r, hence 

~ 4'i"\y)\ ^ (1 ~ Hence, the concatenation of all these coordinates, c/)^'"-' = ©i(/)-^^ 
satisfies the condition above. 

Let d2 = 8e~^ log(e~^). The final embedding : — > IR"'^'^^ is done by combining a weighted 
sum of (j)^'''^ as follows. Let Mi{x) denote the matrix of size d2 x di, such that it is all zero, except 
the (/ (mod d2))th row, which is ipli^) = (/''-^^^^•''^(x). Then 



Miix) 



(l+e)V2- 

To see that the embedding is 1 + 0(e) approximation of fix a pair of points x,y G A1, 

and let S 2 such that d_x4{x, y) G [(1 + e)'""*"-^, (1 + e)'"+^). Then in the relevant coordinates the 
ioo distance between x and y is 



fcez 



>\\'^io{x) - i'h 



oo - ^Wi^lo+dikix) - i'lo+d^k 
k<0 



E 



+d2 



k{x) - 1plo+d2k 



k>0 
Vo/2 



(1+^) 



2+/o 



En _|_ p\2+lo+d2k 
Ai±£i V — 
(1 + £)(Zo+rf2fc)/2 (1 + e)0o+d2k)/2 



> (1 - e) • (1 + e)'«/2 _£.(! + e)'«/2 _£.(!+ e)'o/2 > _ 0(e)) ^/^^^(x, y). 
On the other hand, for each j £ {0, . . . ,d2 — I}, 



^{lpl0+j+d2k{x) - 11^10+ j+d2k{y)) 

'^\\'>Plo+j+d2k{x) - 'iplo+j+d2k 



fcez 



+ 



fc<0 



'^\\4^lo+j+d2k{x) - 'll^io+j+d2k{y)\\, 



< 



k>0 

(1 _|_ ^y+lo+j+d2k _|_ £^2+«o 

E (i+g)(/o+.-+d.fc)/2 + E (i+g)(/o+.-+d2fc)/2 = (1 + 0(£))x/^^A^(x,y). 



Hence \\(p{x) — 4>{y)\\f^ is 1 + 0{e) approximation to y^d^(x,y). ■ 
The relevance of Assouad's embedding to compact representations is clear: Intuitively, (/'(x) is 
short, and given (j){x) and (l){y), we can compute the square of the ioo norm of the difference, and 
obtain 1 + e approximation to dj^ix, y)- Note however, that in order to be able to do it, we need to 
store 0(log(<^/e)) bits for each real number, which may require many words to be represented in 
our computation model (see Section [21). We solve this issue in Lemma l6.9l bv reducing the problem 
for metrics with arbitrary spread a to a set of similar problems on metrics with only polynomial 
spread, on which Assouad's embedding can be applied. 
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Lemma 6.7 Given n-point metric M with a polynomially hounded spread ^ and doubling dimen- 
sion dim, an Assouad's embedding (with parameter e) of M can be computed lOET n 
time. 

Proof: We follow closely the proof of Theorem 16.61 For each scale (1 + eY, we find a e(l + e)'-net 
j\^{(i+e)0 fj-om the net-tree which is 0{e{l + e)') cover and Q,{e{l + eY) separated in 0(n) time. 
We define a graph on this net: two points are connected by an edge if they are at distance at 
most 4(1 + e)'. This can be done in ^-Oidim)^ time using a variant of Rel() sets (basically, we 
compute sets like Rel() that contain points at distance at most 0{e~^) times the current scale, 
instead of 13 times the current scale). We then partition J\f^(^+'^'^ ) to color-classes using the greedy 
algorithm. Implemented with hashing, it works in expected 0{n) steps. Next, for each color class 
we construct a (l + e/2)-ANN data structure, and we thus can compute an (1 + e/2)-approximation 
to dM{x,Ci). Note that in the proof of Theorem 16.61 by enlarging the constants a little bit, 
(1 + e/2)-approximation suffices. We repeat this construction for the log^^^ $ levels in the metric. 
The rest of the embedding calculation is straightforward. 

The running time of the algorithm is therefore e~'^('^™)n log n log ■ 

Remark 6.8 We believe that for e = 100, a similar embedding can be constructed directly on 
the net-tree in 2'^('i''^)n time. The construction seems however much more complicated than the 
one described in Lemma 16.71 We have therefore decided that the slight gain in preprocessing time 
(overall, a factor of log n, since the running time for constructing the net-tree is 2'-^('^™)nlogn) does 
not worth the complications. 

Lemma 6.9 If there exists a (P, S, Q,k)-CRS A for an n-point metric with doubling dimension dim 
and spread < 3{n/e)^'^ , and ifP is concave. Then there exists (P(4n) + 2'^('^™)nlogn, S-\-0{n), Q-\- 
0(1), (1 + e)'K)-CRS B for finite dim-doubling dimensional metrics, without any assumption on the 
spread. 

Proof: Denote by H the low quality HST of Section 13.21 which is 3n^ approximation to the given 
metric Ai. 

Set ai = and 02 = [5(log(e~^) + log2 n)] . Apply the following procedure on H to obtain new 
HSTs Hi, i £ {1, 2}. Scan H top down. Retain the root, the leaves, and all internal vertices u £ H 
with the following property: there exists 6 > such that log2 h = ai (mod |"lO(log(e~^) + log2 n)] ) 
and Ap(u) > b > A„. The HST Hi is constructed naturally on the retained vertices: A retained 
vertex u is connected to a parent v in Hi if v is the lowest retained ancestor of u in H. 

Next, for each non-leaf vertex u £ Hi, i £ {1,2}, denote by C{u) the set of children of u. We 
observe that R{C{u)) = {rep„| u £ C{u)} has a spread at most 3(n/e)^^. To see this, note that 
diam(i?(C(ti))) < A^, and on the other hand let b the largest real number such that b < A„, 
and log 6 = Oi (mod |'lO(log(e~"'^) log2 n)] ). Obviously b > Au/{n/ey^ and for every x,y £ C{u), 
A|ca^(a:,j/) ^ b, and therefore dj^{x,y) > 6/(3n^). Thus for each internal vertex u £ Hi we 
can construct a K-approximate CRS A to R{C{u)). The whole construction time is therefore 
20(dim)^^Qg^_^^^p(^^^) < 2<^('i''^)nlogn + P(4n). 

We equip H, Hi and H2 with a data structure for handling queries for least common ancestor 
and finding an ancestor at a given depth, both in constant time. 

A distance query for the pair x,y £ M. is processed as follows. Let n, = Ica/f. (x,y). let Xi be a 
child of Ui which is an ancestor of x in Hi, and similarly y^. Note that Ui,Xi,yi can be computed 
in constant time using the lea and depth ancestor queries. 
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Further observe that 3i £ {1,2} for which maxjAj;., Ay-} < A|^a//(z,j/)/("'/^)^; finding this 
i is an easy task. 

We next query the CRS A of R{C{ui)) for approximation of (i_A/( (rep^, . , repj^ J . From the above 
we deduce that 

max{d;K(x,rep^J,dA4(y,repj^j} < ^ • ^^^^^2'^^ - '-^'^M{x,y)- 
and therefore we have obtained 7t(l + e) approximation to dMi^iV)- ■ 

Corollary 6.10 Every n point metric with doubling dimension dim has (P = e~'^('^™^nlog^ n, 
S = e-o(dim)^^ Q ^ g-o(dim)^;^ ^ 1 ^ e)-(7i?5. 

Proof: Combine of Lemma 16.91 and Lemma 16.71 ■ 
Note that in Corollary I6.1UI the query time depends on e, in contrast to the claim in Theo- 
rem a). This can be remedied using Lemma 16.41 

Proo/.'[Proof of Theorem 16.11 (j^] Use the CRS of Corollarv I6.1UI with constant Eq = 0.1 as the 
bootstrapping CRS in Lemma 113 ■ 
Proof:[Piooi of Theorem 16.11 (|bj) ] In j22|, an alternative proof for Assouad Theorem is given 
with much improved bound on the dimension of the host space: They prove that for any metric 

{Ai, (i_A/() with doubling dimension dim, it is possible to embed {M., d^J^) in £00 ^ with distortion 
0(dim2).4 

This embedding can be done in polynomial time. Using it as a replacement for Lemma 16.71 we 
therefore obtain the claimed CRS. ■ 

Remark 6.11 The distortion of embedding into poly(dim) dimensional normed space can not 
be improved below 1.9, since such an embedding gives 1.9 approximate CRS which use only 
0(npoly(dim) log 0) bits of storage with label length which are polynomially dependent on dim 
(see Section f6.3() . but Talwar 44J have shown that such CRS necessarily use at least n2^(<^™) bits, 
which is impossible for dim = (log log n). In this sense the embedding technique of j22j can not 
replace Assouad's original technique. 

It is still open whether the construction time in Theorem 16.11 (b) can be improved to near- 
linear. The difficulty lies in the algorithmic version of the Lovasz Local Lemma. As discussed in 
Remark ESI distortions as high as 2^°''^'™' are tolerable in this context. 

6.2.1 Lower Bound 

We next argue that beating the O(dim) query time using schema similar to the one presented above, 
is unlikely. 

For given reals di, D, d2 > 1, we say that {di, D, d2)-Assouad-type-scheme (ATS) exists if there 
is a monotone increasing bijection / : [0, 00) — > [0, 00), such that for all finite metric spaces (P, ^ai), 
with doubling dimension at most di, there exists cj) : P ^ M||^|| , such that for x,y £ P, we have 

dM{x,y) 



D 



<f(\\ct>{x)-4>{y)\\x)<dM{x,y). 



^If one wants to optimize the distortion using their technique, then it is possible to obtain O(dim) distortion when 
embedding into ^^{dimiogdim)^ 
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For example, the embedding of cited above is (di, 0((if ), 0((ii))-ATS for any di > 1, and 
it uses f{x) = . 

Proposition 6.12 If d2 < c?i/5, then for any D > 1, no {di,D,d2)-ATS exists. 

Proof: The argument distinguishes between two essential cases: "Concave" function / can not 
be used in any ATS since it causes a violation of the triangle inequality. For "convex" functions 
/ we slightly generalize an argument from [2j that uses topological considerations (Borsuk-Ulam 
theorem) to conclude the impossibility. 

Indeed, fix a {di,D, d2)-ATS with a function /, where d2 < di/5. Denote by g : [0, oo) [0, oo) 
where g = f~^. 

Suppose first supo<a<fe<oo g(a)/a ~ ^ ( "coucave /"). Fix a and b, such that < a < 6 < oo 
and g^^^li'^ ^ 100-D. Let n = [2Z)6/a], and let P be the line metric on {0, ...,n} such that 
dj^{i,j) = a\i — j\. By the assumption, there exists cj) : P ^ ]R|^|| , such that — + < 
gidMihi + 1)) = 5(0)1 and on the other hand 

= < 110(0) -*(„)ii,. 

since g is monotone increasing, as / is monotone increasing. Then by the triangle inequality 

bg{a) 



n 



g{b) < \\m - Hn)\\x < - 1) - mix < ^aia) < ad 



■ 1 " 



which implies that -4^rr- < 41?, which is a contradiction. 

Next, assume that there exists C > 1 such that supo^^<^^oo ^<^)°h — ^ ("convex /"). Then, 

for any a < b we have < ^(fl)- In particular, we have ^^^-^ ^^cd]!4{x'y)'^'^^^'^ — 9idMix,y)/ D). 
Namely, 

g{dM{x,y)) . fdM{x,y)\ . ^ /, / ^^ 
^7-^ < 9\ ) ^ll'/'(^) -'/'(y)llx ^ 9{dM{x,y)). 

Since is d2 dimensional, by John's theorem (see [HI Ch. V]) it can be approximated by||-||2 up 
to a \/7i2 factor. We thus have a C > 1 such that for any di dimensional finite metric {P,d_\4), 
there exists (j)' : {P,dM) satisfying 

g{dM{x,y)) , Nil ^ /, . XX (-.-s 
-^1 <\\(l){x)-(t){y)\\^<g{dM{x,y)). (1) 

We next estimate how much g o djvi distorts dj^ as a function of the spread of P. Assume that 
min^^^ygP (i_A/((x, y) = ai, and max2.^ygp d^(x, y) = 61, that is ^{P) = ^i/ai- Then 

9{t) 9{ai) 9{t)ai „g(ai) 
max = • max — - — - < G , 

ai<t t ai ai<t tg{ai) ai 

. s bi g{bi)s . „ 61 

and max —— = ——— ■ max - — -— < - 



s<bi g{s) g{bi) s<bi big{s) g{bi) 
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Thus, consider the "distortion" of g, we have 



$(P) 



C 



g(ai) 



■C 



< 



9(bi) 



bi/ai 



C 



2 g(QiJ 



As g{{)) = and hm^_>oo g{x) = oo we conclude that this ratio tends to as the spread $(-P) tends 
to oo. Combining it with we conclude that for cf) : {P,dM) ^'^\\ ' defined as (p{x) = (p'{x), 

have dist(0) = o(<I>(P)). We will next show that this is impossible when P is sufficiently dense net 
of §^^2. 



Let < 7] < 0.1. We take P = P„ to be a r?-net of 



The finite metric Prj has doubling 

d2 



dimension at most di. From the above we can embed : P^ ^ H^j^l^^ with distortion o($(P)) = 

o(?7~^). By scaling we may assume that this embedding is 1-Lipschitz. By Kirszbraun Theorem 
(see Ch. 1]), the embedding (p' can be extended to the whole sphere (j)' : §^^^^ — > M|j^|| without 

increasing the Lipschitz constant. Borsuk-Ulam theorem (cf. (3^) states that there exists x ^ S'' 



such that 4>'{x) = 4>'{—x). Note that 3y,z G P^ such that ||x — y\\2 < f/j and||(— x) 
(J)' is 1-Lipschitz, we have 

U{y)-ct>'iz)\\= ^\y)-^'{z) < ^'(y)-0'(x) + ^'(-x) - 0'(z) 



< rj. Since 



< 27]. 



On the other hand \\y — z\\2 > 1 — 27], which means that the Lipschitz constant of thus 
the distortion of (j)' , is at least Q,{r]~^). This is a contradiction for sufficiently small r/ > 0, since we 
argued above that the distortion must be o(<I>(P)) = o(r/~^). ■ 



6.3 Distance Labeling 

Approximate distance labeling scheme (ADLS) seeks to compute for each point in the metric a short 
label such that given the labels of a pair of points, it is possible to compute efficiently an approxi- 
mation of the the pairwise distance. Thus, ADLS is a stricter notion of compact representation.^ 
This notion was studied for example in |^ 1321 • 

In the constant doubling dimension setting Gupta et al] [SH] have shown an (1 + e)-embedding 

of the metric in 1^^°^^-' . This implies (1 + e)-ADLS with 0(lognlog$) bits for each label (the 
O notation here hides constants that depend on e and dim). Talwar 44J has shown an improved 
(1 -t- e)-ADLS with only e-O(dim) j^g^ t^^^^ label. Slivkins i43j has shown a (1 + e)-ADLS with 
g-O(dim) jQg2 7^ log log cj) bits per label. Their techniques seem to be very different from each other. 
Here we improve Slivkins' result and unify it with Talwar result under the same technique. 

Proposition 6.13 Given a finite metric space, one can build a (1 + e)-ADLS with 

min|e-^('^''^) log e-O(dim) j^g ^^j^g ^ ^ ^og log ^>)} 

hits per label. 

Furthermore, there exist one dimensional finite metric spaces of size n, and spread <I> > 2^" for 
which any 1.9- ADLS requires labels of size r2(log n log log <I>) bits per label. 

^When comparing the storage of ADLSs to that of the CRSs from the previous sections, note that here we count 
hits, whereas in the rest of the paper we count words of length 0(log n + log log $ + log e~^). 
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Proo/; [sketch] First, labels of length £-C'(dim) ^ follow directly from Theorem 16.61 We have 
g-O(dim) coordinates, and, as discussed after the proof of Theorem I6.6( we only need 0(log(<I>/e)) 
bits of accuracy for each coordinate. 

We next show approximate distance labeling scheme using £~0{dim) logn(logn+loglog $) 

bits per label. We do so by presenting a "distributed implementation" of the data-structure used 
to prove Corollarv 16.101 That data structure consists of two trees (HSTs) Hi, H2 on the same set 
of leaves: the points of the metric. Given two points x^,x^, we compute Ui = Icajf. (x^, x^), and 
xl the ancestor of in Hi which is the child of Uj. We then apply an Assouad embedding A{xl) 
that uses 0(logn + loglog<I> + log(e^-'^)) bits. We define an identifier I{v) of vertex v G Hi to be 
A{v) concatenated with the A^, (encoded with 0(loglog<I>) bits). Hence, given two points x^,x'^, 
using the identifiers I{x\), /(xf), lix^), I{^2)^ -^(""i)) we can compute 1 + e approximation 

of d_\4{xi,X2)- We now use (the proof of) a result of Peleg P^j: Given an n-vertex rooted tree 
with identifiers I{v) of maximum length s on the vertices, it is possible to efficiently compute labels 
L{v) of length 0(log n(log n + s)) to the vertices, such that given L(x) and L{y) one can efficiently 
decode I{u), where u = lca(x,y). 

Unfortunately, we need a little bit more: an access to the children of u which are the ancestors 
of x and y. In order to achieve it we tinker with the construction of Peleg: In Definition 3.2 in |39j . 
we extend the tuple Qi{v) to be 

Q^(^) = (|(^-1, /(7»-iW)),(^,^(7» W)),(^ + 1,^(7^+1 W)), (^,/(hs(7i(^)))) y 

where hs(ti) is the heavy sibling of u (the underlined part is our extension). By studying Peleg's 
construction, it is easy to verify that this extension suffices. 

The above construction is asymptotically optimal in terms of n and $ when $ > 2^", as 
we now prove. In a family of n-vertex weighted rooted binary trees, such that any exact 
distance labeling scheme of the leaves requires labels of length 17(lognlogM) bits, where the edge 
weight is in the range {0, . . . ,M — 1}. A further property of that family of trees is that the depth 
h = Mlog2n (i.e., the distance from the root) of all the leaves is the same. We next transform 
each tree T in that family into an HST H by giving every vertex v a label 2~'^'^P*''^r(^). For any 
two leaves x and y let dT{x,y) = 2{h + log2 di^ (x, y)). Furthermore, even 1.9 approximation of 
dH{x,y) allows us to recover the exact value of dH{x,y), since this value is an integral power of 2. 
Let us summarize: Given a 1.9 approximation of the distance in H allows us to obtain the exact 
distance in T. Therefore by setting M = (log2$)/n, it proves a lower bound of r2(lognloglog $) 
on the average label's length for 1.9-ADLS for this family of HSTs. Since these HSTs are binary 
their doubling dimension is 1. ■ 

After a preliminary version of this paper appeared, Slivkins 0^ managed to produce an ADLS 
with labels length of e~'-^('^™Hognloglog$, which improves upon our construction in the range 
^loglogn <^ $ <^ 2". 

7 Doubling Measure 

A measure on a metric space M is called ?7-doubling if for any x £ A4 and r > 0, n{h{x, 2r)) < 
rj ■ n{h{x , r)) . Doubling measure is already a useful notion in analysis of metric spaces (see jHI)) and 
has recently been used in some algorithmic applications ^43]. Vol'berg and Konyagin jU| proved 
that any compact A doubling metric space has A'-^^^^-doubling measure (the opposite direction is 
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easy). Wu's proof of this theorem 1H] can be implemented in hnear time on the net-tree (for finite 
metric spaces). 

We assume that the net-tree T is already given. Denote by deg{v) the number of children of 
V gT. Let 7 = maXtjgT deg(t') be the maximum degree in T. As we have seen before, 7 < 2*^(*^™'*. 
The probability measure fi is computed by calling to Partition(root, 1), where Partition is defined 
recursively as follows. 

Partition (-u eT,pue [0,1]). 
if n is a leaf then 

Set /i({rep„}) ^ p„. 

else 

for each child u of n with rep^ / rep„ do 

Set py ^ puh- 

Call Partition(t;,p^). 
Let vq be the unique child of u such that rep^^^ = rep„. 
Set p„o ^ pu{l - (deg('u) - l)h). 
Call Partition(t'o,Pi,o). 



Claim 7.1 For any u & T, we have pu = fM{Pu)- 

Proof: By straightforward induction on the height of T. ■ 

Claim 7.2 Fix I £ N, and two vertices u and v in T , such that max{£(n), ^(v)} < / < min {£(p(n)), ^(p(w))} 
and (i;v4(repu,rep^) < 40t'. Then pu < j^'^^^Pv 

Proof: Denote hy w = Icariu, v), and hy w = uq, ui, . . . ,Ua = u the path in T from w to u, and by 
w = vo,vi, . . . ,Vh = V the path in T from w to v. 

We claim that for any z > 1, if i{ui) > / + 3, then rep„. 7^ rep^.^^. Indeed, otherwise 

c^A4(rep„^,rep„) < (i;vi(rep„^^^,rep„) + (i;vi(rep„,rep„) 

< _2i_ . -^^K+i) _|_ 40-7-' < -2_T-^K) _)_ 40r~'* • t^("») < r^("») 

but this is a contradiction to the packing property of Definition 12.11 since v ^ . (note that for 
this argument to work, r need to be large enough constant, say 11). 

Next, we claim that for any i > 1 for which i{ui) > / + 3, £(uj_i) = £{ui) + 1. Otherwise, 
(^(ui^i) - 1 > i{ui) + 1 implying 

t^>l(rep„i,rep„) < a!Ai(rep„^^^,rep„) + dA^(rep„, repj < ^ • r^^"') + AOt'^ ■ r^("") 

^t(t— 1) ' — ' ' ' 

contradicting the packing property of Definition 12.11 since v ^ Pu^- 

Thus, the path between u and w is full, containing vertices on all levels, except maybe the last 
three levels. Furthermore, the representatives are different in each level. We therefore conclude 
that Pu < Pw ■ On the other hand, p^ > Pw^^^^'^~^~^^ ■ Therefore pu < "J^Pv ■ 

Theorem 7.3 For any n-point metric space having doubling dimension dim it is possible to con- 
struct a 2*^('^'™) doubling measure in 2'^*^'^'™)nlogn time. 
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Proof: The running time of Partition is clearly linear, and is dominated by the time to construct 
the net-tree. 

We are left to prove that is a A^(i)-doublinK measure. Let x G P and r > 0. Denote by 
N = £ T £{u) < log^{r/8) < £{p{u)) |. As we have seen in Proposition 1^21 the representatives 
of the vertices of N forms a net in the right scale. In particular, there exists x £ N such that 
rep^) < 3r/8 and C b(rep2,3r/8) C h{x,r). Hence Px < fi(h{x,r)). On the other hand, 
any two different representatives of vertices from are at least r/40 separated, and therefore, for 
A = A n |n G T rep„ G b(j;,3r)|, we have |A| < A'^(^). Note that h{x,2r) C UuexPu, and 
therefore 

H{h{x,2r)) <y^Pu< |A|maxp„. 

By Claim lO max„gxP« < A*^(^)p2. We conclude that //(b(x,2r)) < A*^(^V(b(a;, r)). ■ 
We note in passing that algorithm Partition can be programmed in our computational model 
since every point gets at least 2~'^("'^°s") measure, which can be easily represented in a floating- 
point word of length O(logn). Moreover, the algorithm has a "built in" mechanism to handle 
rounding error: instead of dividing by 7, we can divide by say 27 and now rounding errors are 
automatically offset in the measure given to vq. 



8 Lipschitz Constant of Mappings 

Definition 8.1 A function / : (P, z^) {M,p) is K -Lipschitz if for any x,y £ P, we have 
pifix),f{y))<K-u{x,y). 

A point X G P is K -Lipschitz if, for any y G P, we have p{f{x), f{y)) < K ■ v{x, y). 

Thus, given a set of points P C IR°', and a mapping f : P ^ IR*^ , it is natural to ask how 
quickly can we compute the Lipschitz constant for / on the set P, and more specifically, to compute 
it for every point of P. 

8.1 The Low Dimensional Euclidean Case 

Here, we consider a mapping / : P ^ (M, p), where P C ]R of size n, and (M, p) is an arbitrary 
metric space given as a matrix. 

Proposition 8.2 Computing the Lipschitz constant for f on P can be done in O(nlogn) time. 

Proof: Indeed, let a, b, c be three numbers in P, such that a < b < c. Observe that 

P{f{c)j{a)) < p{f(,c),f{b))+p{f{b),f{a)) ^^^^ PU{c)J{b)) ^ /(a)) \ 

c — a ~ c — b-\-b — a ~ \ c — b ' b — a / 

since for any p, q, r, s positive numbers such that p/q < r/s, we have p/q < (p + f)/iQ + s) < r/s. 
Thus, the Lipschitz constant is realized by a consecutive pair of points in P. We can therefore 
sort P, and compute the slope for every consecutive pair. Clearly, the maximum is the Lipschitz 
constant of /. ■ 

Proposition 8.3 Let P be a set of n numbers on the real line, and let f : P ^ TR be a given 
mapping. One can compute the Lipschitz constant of f on every point of P in O(nlog^n) time. 
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Proof: Consider the set Q = {{p, f{p))\ p G P}- Let p be a point in P, and let Lp be the set of 
points of Q strictly to the left of p (according to the x-order), and Rp the set of points to its right. 
Denote by C7i{A) the convex hull of ^ C M'^. If we know the tangents to C7i{Lp) and C7i{Rp) that 
passes through p, then we can compute the Lipschitz constant of p in constant time (i.e., it is the 
slope of the tangent with largest slope). 

Here, one can use the data-structure of Overmars and van Leeuwen j38j. which supports the 
maintenance of convex-hull under insertions, deletions and tangent queries in 0(log^ n) per opera- 
tion. Indeed, sort the points of P from left to right. Let pi, . . . be the sorted points. Clearly, 
given CTC{Lp.) and CTC{Rp.) stored in the dynamic convex-hull data-structure, we can compute 
CTi.{Lp.^-^ and C'H{Rp^^-^, by deleting pj+i from CTi.{RpJ, and inserting pi to C7i{LpJ. Thus, we 
can compute all the relevant convex-hulls in 0(n log'^ n) time. Furthermore, when we have C7i{Lp^) 
and CTi.{RpJ, we perform tangent queries to compute the Lipschitz constant oipi. Thus, the overall 
running time is 0(n log'^ n). ■ 

Theorem 8.4 Given a set P of n points in the plane, and a mapping f : P ^ TR, then one can 
compute the Lipschitz constant of f in O(nlog^n) expected time. 

Proof: Assume that we know that / is i^-Lipschitz on a set Q P, and we would like to verify 
that it is i^'-Lipschitz on {q} U Q, where q £ P \ Q. This can be visualized as follows: From every 
point p P, there is an associated point in IR'^, which is p = {px,Py, f{p))- Being a i^T-Lipschitz as 
far as p is concerned, implies that q must lie below the upper cone of slope K emanating from p, 
and above the lower cone of slope K emanating from p. Thus, if we collect all those upper cones, 
then q must lie below their lower envelope. However, since the upper cones all have the same slope, 
their lower envelope is no more than a (scaled) version of an additive weighted Voronoi diagram in 
the plane. Such a diagram can be computed in O(nlogn) time for n points, and a point-location 
query in it can be performed in O(logn) time. 

In fact, using the standard Bentley and Saxe technique 0, one can build a data-structure, 
where one can insert such upper cones in O(log^n) amortized time, and given a query point q in 
the plane, decide in 0(log^ n) which of the cones inserted lies on the lower envelope vertically above 
q. Similar data-structure can be build for the upper envelope of the lower cones. 

Thus, if we conjecture that the Lipschitz constant is then one can verify it for P in 
O(nlog^n), by inserting the points of P into the upper and lower envelope data-structure de- 
scribed above. However, let assume that K is too small. Then, after inserting a subset Q of points 
into the data-structure, we will try to verify that the Lipschitz constant for a point p £ P is K and 
fail. Then, it must be that the Lipschitz constant of / on Q U {p} is realized by p. Thus, we can 
compute the Lipschitz constant of p in Q U {p} in 0(|(5|) time, updated our guess K, and rebuild 
the upper and lower data-structures for Q U {p}. 

Of course, in the worst case, this would required 0(n^ log^ n) running time (i.e., we would fail 
on every point). However, it is well known that if we randomly permute the points, and handle the 
points according to this ordering, then the value of the Lipschitz constant on every prefix would 
change O(logn) times in expectation. Thus, this would lead to O(nlog^n) expected running time. 
Moreover, a slightly more careful analysis shows that the expected running time is 0(nlog^ n). See 
[16] for details of such analysis. ■ 
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8.2 Constant doubling dimension to arbitrary metric 

Theorem 8.5 Given a metric {P, u) of n points having doubling dimension d, and a mapping 
f : P ^ {Ai, p), where M is an arbitrary metric space. Then one can compute (l+e)- approximation 
of the Lipschitz constant of f in ne~'-'^^^ log^ n expected time. 

Proof: The algorithm: 

1. Compute e~^-WSPD of P according to Sectional 

2. Set K ^0. 

3. For every pair (A, B) £ e^^-WSPD do: 

(a) Obtain some pair of points a £ A and b € B. 

(b) Compute K ^ max jiT, ^^^^j^^]- 

Obviously the value K computed by the algorithm above is not larger than the Lipschitz con- 
stant of /. We next show that it is not much smaller. Let x,y G P he a pair in which / obtains its 
Lipschitz constant, i.e., ^^^^g'^^j^^^ = max^^t ^^^T^ff^- Let {A,B} G WSPD be a pair such that 
X E A, y G B. Our algorithm chooses some pair a G A, b £ B. Using the triangle inequality we 
have 

p{f{a),f{b)) ^ p(/(x),/(y))-diam(/(A))-diam(/(i?)) 
i'{a, b) ~ ^{x, y) + diam(^) + diam(i?) 

> p{f{x),f{y)) - diam(/(A)) - diam(/(i?)) 
il + 2eMx,y) 

If max{diam(/(74)), diam(/(i?))} < e • p{ f{x), f{y)) then we conclude that 

p{f{a)J{b)) ^ {l-2e)p{f{x)J{y)) 
u{a,b) - {l + 2e)u{x,y) 

and we are done. Otherwise, assume that diam(/(A)) > e ■ p{f{x),f{y)). Then there exists 
/(ai), 7(02) G f{A) for which p{f{ai),f{a2)) > e ■ p{f{x), f{y)), whereas 

iy{ai, 02) < diam(A) < e ■ u{A, B) < e ■ i^{x, y). 

So 

p(/(ai),/(fl2)) ^ e-p{f{x)J{y)) 
v{ai,a2) e-u{x,y) 

which is a contradiction to the maximality of the pair {x, y}. ■ 



9 Fast approximation of the doubling dimension 

Theorem 9.1 Given a metric space M with n points, one can approximate the doubling dimension 
dim of J^, up to a constant factor, in 20(dim)„ 

logn expected time. 
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Notice that this theorem, apart from its intrinsic interest, also removes the need to specify dim 
together with the input for the other algorithms in this paper. 

The algorithm suggested in Theorem 19 . II naturally uses the net-tree. 

Proposition 9.2 Given a net-tree T of a metric M, and denote by Xt the maximum out degree 
in T, then log At is a constant approximation to dim(A4). 

Proof: Let v £ T he the vertex with the maximum number of children At- By Definition 12.11 any 
covering of b(rep^, ■:^t^^'"^), by balls of radius ^^^(^^i'^ requires at least At such balls. This means 
that dim(X) = 17(logAT). 

The upper bound dim(AI) = O(logAT) follows easily from the arguments of Section [7| There, 
we actually prove the existence of A^^^^-doubling measure in A4, and it easy to prove that the 
existence of a doubling measure in Ai implies that dim(A^) < a. ■ 

Proo/.-[Proof of Theorem 19. Ij By Proposition 19 . 21 it is enough to show an implementation of the 
algorithm for constructing the net-tree that is oblivious to the the doubling dimension of the metric. 
Checking the algorithm in Section |21 we observe that the algorithms in Section [3. 11 Section 031 and 
Section f3. 41 are indeed oblivious to the doubling dimension. We are therefore left with describing a 
doubling dimension oblivious algorithm for constructing HST that O(n^) approximates the given 
metric. More specifically, the only part need to be changed is the use of Lemma 12.41 in Lemma l3.51 
To this end, instead of knowing A, we "guess" the doubling constant to be 2*, increasing i until we 
"succeed". More accurately, in the ith iteration, we apply the following sampling step 2^* times: 
Pick randomly a point p from P, and compute the ball h{p, r) of smallest radius around p containing 
at least n/{2 ■ 2^*) points. Next, consider the ball of radius b(p, 2r). If it contains < n/2 points the 
algorithm succeeded, and it stops. The algorithm is guaranteed to stop when i > [logn]). Denote 
hy 5 = S{X) the random value, which is the value of 2* when the algorithm stopped, when applied 
to a point set X C 

The resulting spanner is a 3n-approximation regardless of the random bits, and thus the correct- 
ness of the net-tree algorithm is guaranteed. We only need to argue about the expected running 
time for constructing the HST. The running time of the HST constructed is dominated by the 
spanner construction and the number of edges in it (see Lemma l3.5() . Denote by A the doubling 
constant of the metric Ad . 

Claim 9.3 For any X Q M, 

1. E[-5(X)-3] > x-'^/l&. 

2. e[5(X)3] =o(A3). 

Proof: Consider the algorithm above for computing 5{X). Once i reaches the value k = [log2 A] , 
the probability of success on each point sampled is at least 2~^^ (by the argument in Lemma [23) • 
Hence the probability of success in the ith round, i > k, conditioned on a failure in all previous 
rounds is at least 1 — (1 — 2"^^^)^^', which means that 

E[6{Xr^] > 1 - (1 - 2-3'=)2''2-3^ > (1 - l/e)A-V8. 
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It also means that Pr[6 > 2*=+*] < (1 - 2 

oo oo 

^[^^] =Y.PA^^ >t]<2X^+ exp 

i=l t=23k 



< exp^ 

log t—3k 
3 
2 



, and therefore 
I < 2A^ + 0(1) 



We only prove an upper bound on the running time. Bounding the number of edges is similar. 
Denote by f{X) the running time of the algorithm when applied to X C A^, and let g{X) = 
BifiX)], and g{n) = supj^^Ai, \x\=n9iX). 

The spanner construction algorithm of Lemma 13.51 satisfies 



giX) < E 



max (g(a\X\) + ^((l - a)\X\) + c6(Xfn) 



(2) 



We now prove by induction that g{n) < cA^nlnn for some c > 0. Fix y to be a subset of Ai of 
size n such that g{Y) = g{n). We have 



g{n) < E 

< E 

< E 



max {-E[g{a\Y\)] + B[g{{l - a)\Y\)] + 



n] 



5(y)-3<a<l/2 

max (cX^alYl ln(a\Y\) + cX^(l - a)\Y\ ln((l - Q)|y|) + 



n] 



n 



< cX-^n ■ E 



< cA'^n • E 



< cA^nlnn + cA^n • E 



5-^ \n{6-^n) + (1 - d'^) ln((l - 5--') n) + {c" X^ + d')n 
ln((l-(5-3) n)j +{d'X^ + d')n 

ln(l - + (c"a3 + d')n 



< cX^n Inn - cX^n • E + (c"A^ + d')n 

< cX^n Inn- cX^n • (A" ^16) + (c"A^ + d')n 

< cA^nlnn, 

since ln(l — 5""^) < -5^"^, by Claim and for c > large enough. ■ 

10 Concluding Remarks 

In this paper, we show how to efficiently construct hierarchical nets for finite spaces with low 
doubling dimension, and use it in several applications. We believe that this result will have further 
applications. 

Among other things, our fast construction of WSPD implies a near linear time construction of 
approximate minimum spanning tree of the space. Our fast construction of net-tree implies that 
one can do 2-approximate A:-center clustering in 0(n log n) expected time. 

Further transfer of problems and techniques from low dimensional Euclidean space to low dimen- 
sional metrics seems to be interesting. A plausible example of such a problem is the construction of 
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(1 + e)-spanners with some additional properties (such as low total weight or small hop-diameter). 
Results of this flavor exist in low dimensional Euclidean spaces. 

It is easy to verify, that for general metric, no HST can be constructed without inspecting all 
(2) edges. Indeed, consider the uniform metric over n points, and change in an adversarial fashion 
a single edge to have length 0. 

10.1 All nearest neighbors. 

The all nearest neighbor problem is to compute for a set P of n points the (exact) nearest neighbor 
for each point of p E P in the set P \ {p}. It is known that in low dimensional Euclidean space 
this can be done in 0(n log n) time |13 | I46 | [TT|. One can ask if a similar result can be attained for 
finite metric spaces with low doubling dimensions. Below we show that this is impossible. 

Consider the points pi, . . . ,pn, where the distance between pi and pj, for i < j, is either 2^ or 
2^ +£, for e < 0.1. It is easy to verify that this metric has doubling constant at most three. We now 
show that for any deterministic algorithm for computing all nearest neighbors, there is a metric in 
the family of the metrics described above for which the algorithm performs (2) distance queries. 

This claim is proved using an adversarial argument: When the adversary is queried about the 
distance between pi and pj, for i < j, then if not all the distances between pi, . . . ,Pj-i and pj were 
specified, the adversary will always return the distance to be 2^ + e. The distances 2^ would be 
returned only for the last pair among the j — 1 pairs in this set. In particular, for the algorithm 
to know what is the closest point to pj, it must perform j — 1 queries. Thus, overall, an algorithm 
doing all nearest neighbors for pi, ... ,pn, will have to perform (2) queries. 

A similar asymptotic lower bound can be proved for randomized algorithms using Yao's principle 
(here the adversary selects for each j one index ij < j at random for which d{pi.,pj) = 2^ , and for 
the rest of i 7^ ij, i < j, d{pi,j) = 2^ + e). 

At this point, it is natural to ask whether one can achieve running time of 0(nlog(n<I>(P))) for 
the all nearest neighbor problem. This, however, is straightforward. Indeed, compute 4-WSPD of 
P. Clearly, if g is a nearest neighbor for p, then there is a pair in the WSPD such that p is the 
only point on one side, and the other side contains q. Thus, we scan all such unbalanced pairs 
(one point on one side, and many points on other side), and compute the nearest neighbor for each 
point. Thus, this computes all nearest neighbors. As for the running time analysis, consider all 
such pairs in distance range / to 21, and observe that by a packing argument, for any node u in 
the net-tree, the number of such WSPD pairs with u in them is a 2'^^'^™''. In fact, along a path in 
the net-tree, only a constant number of nodes might participate in such pairs. Thus, every point is 
being scanned 2*^'-'^™^ times, implying that scanning all such pairs takes 2<^(<^™)n time. There are 
[lg(<I>(P))] resolutions, so the overall running time is 2<^('i''^)nlog(n$(P)). 
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A Proof of Lemma 13.91 

Notice that Lemma 13.71 implies that (rj)j>i is monotone non-increasing sequence, and that > 
ri > Til {I +n~2) > |ri. 

Proof of Lemma \3.y[ We prove by induction on k all five assertions together. The base case is 
obvious. Assume by the induction hypothesis that T^^^^^ satisfies all the properties above, and we 
prove it for r(^). 

Property Every point inserted during the Ith phase (i.e., a point for which [log^Fj] = 

must have its current parent (in T^^^) at level /. Thus, if i{u) > I, this means that Cp^, was inserted 
before the current phase, which means that it is indeed the closest point to pk among {pi, . . .ph}- 
Otherwise, if £{u) = I, then 

dM{u,q) < dM{u,Cp^) + dMicp„Pk) + dM{Pk,q) < 2 ■ t' + (1 + n-2)r' + (1 + n-2)r' < 13 • r'. 

Since q appears before the level I began, either i{p{q)) > I and then q £ Rel(?i), or £{p{q)) = I, but 
then it must be that repp^^-j = q, so p(g) G Rel(ti). Either case g is a representative of a vertex in 
Rel('ii) which is the same as Rel(n) (in T^'^-i)). 

Property du]). We shall prove it both for p^, and the new internal vertex (in case ^ of the 
construction). Consider first case in the construction: 

c^x(i'ep„,rep^) = dM{Tep^,q) < /^^^ 

where the last inequality follows from the induction hypothesis. Also, 

dM{^ep,,pk)<2-r'^^\ 

and we are done with the first case of the construction. 

Case ((E| follows from the definition of q, and since as argued above, for u = p((?), rep„ = q. 

Property ()iii|>. Fix some t £ JR., and let x and y be two vertices for which max{^(x), ^(y)} < 
t < min{£(p(x)),£(p(y))}. If both x and y are not pk then the claim follows from the inductive 
hypothesis (even for the new formed internal vertex, since it inherits its parent and representative 
from a previously established vertex). Otherwise, assume x = pk- As is the latest addition of 
leaf to T, d;v((pfc, repy) > r^-i > r'^^. Note that ^(p(pfe)) = I, so t < I, and we conclude that 
d>i(rep3,,repj^) > r*-i. 

Property (jivjl . We next prove that r(^) is a net-tree. The only non-straightforward claims 
are the packing and covering properties. The covering property follows from Property ^ of this 
lemma: Let ti = ui be a vertex, v = Um a descendant, and {ui, . . . , Um) the path between them in 
T, then 

m—1 m—1 m—1 

d^(rep„,repj < dA^(rep„,, rep„,^ J < 2 ^ r^(«>) < 2 r^(«i)-(-i) < • 

i=l i=l i=l 
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The packing property is more delicate. Let w be an arbitrary vertex in T^^\ and x ^ Py^ a 
point. We want to prove that d_A4(x,rep^) > 2^^^t^^^^'^^^~^ . Let x £ T^^'^ be an ancestor of x 
such that l{x) < £{p{w)) — 1 < £{p{x)). Applying Property with t = i{p{w)), we get that 
dAi(repj,rep^) >T^(p(-))-i. 

If X = x, we are done. Else, if £{x) < £{'p{w)) — 1, then, by Property (jn)), we have 

dM{rep^,x) > dA,(rep^,repj) - dA,(repj,x) > /(p(-))-i - ^ . r^(p(-))-2 = . /(p(-))-i, 
and we are done. 

Otherwise, let x € T^^^ be an ancestor of x which is the child of x (p(x) = x). If rep^ = repj 
then the preceding argument (where i{x) < i{p{'w)) — 1) also applies here, and we are done. 

Otherwise, we get the following situation: £{p(x)) = i{p{w)) — 1, and rep^ / repp(j.), but this 
can happen only if x was inserted during level £{p{w)) — 1. Recall that the algorithm connects 
rep^ as a child of vertex in level £{p{w)) — 1 whose representative is the closest point among those 
appearing during the levels greater than £{p{w)) — 1. Note that both repp(^) and rep^ inserted in 
level greater than £(p{'w)) — 1, we conclude that (i;\4(rep^, repp(^)) < (i>i(rep^, rep^), therefore 

dA^(repj;,rep^) > max|(i>i(rep5;,repp(5;)),dA4(rep^,repp(5;)) - ^^^(repj;, repp(j;))| 

^ d^(rep^,repp(^.)) ^ ^ ^ _ ^iipM)-i_ 



Hence, by the covering property, 

dM{x,repJ > d7n(rep5;,rep^) - dA^(repj;,x) > 0.5 • T 



e{p{w))-l _ 2t_ . ^£(p(«>))-2 



T-5 . ^(p(«,))-l 



and we are done. 



Property (jvj) . Assume that a new vertex x is attached as a child to a vertex y. We shall prove that 
our traversing algorithm visits all vertices w for which either w E Rel(x) or x S Rel(?i;). Suppose 
first that x G Rel(w). Thus, £{w) < £{y). Let z be an ancestor of w for which £{z) < £{y) < £{p{z)). 
Let {z = zi, . . . , Zm = w) be the path between them in T. Then, for any l<i<m — 1, it holds 

dAi(rep^,rep^J < dMirep^,repJ + duirep^^^TepJ < 13 • r^^^™) + ^ • r^(^') < 13 • t^^'^\ 

Thus X G Rel(zj), for any 2 < i < m. So if z = zi G Rel{y), we are assured that — ZfYi will be 
visited. Indeed, z £ Rel(y) since, 

c?>!(repj^,repj < d^^Crepy, rep^) + ^^^(repj,, rep^) + ^^^(rep^, repj 

< 2 • + 13 • r^^"-) + ^ • r^(^) < 2 • t^^-^^ + 13 • t^^^^-^ + ^ • t^^^^ < 13 • T^^y\ 

— T— 1 — T— 1 — 

This means that z £ Rel(y), by the inductive hypothesis. 

Next, we consider the case when w £ Rel(x). In this case £{w) < £{x) < £{'p{w)) and 

(ix(rep^,repj^) < (i;n(rep^, rep^,) + (ix(rep^, rep^^) < 13 • r^^"") + 2 • /^^^ < 13 • /^^^ 
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Hence, if £{-p{w)) > £{y) then w G Rel(y) which implies that w G Rel(y) by the inductive 

hypothesis, and we are done. 
If ^(pH) =e{y) then 

<^A^(repp(^„),repj^) < dx(repp(^), rep^) + dx(rep^, rep^,) + ^^^(repj., rep^^) 

< 2 • r^(2/) + 13 . ri{y)-i + 2 . r^iv) < 13 • 

So, in this case p{w) G Rel(?/), and using the inductive hypothesis, we are done. 
We are left with the case £(p(w)) < £{y). In this case 

dM (repp(^) , rep^) < (Im (repp(^) , rep^) + (Im (rep^^,, rep^,) 

< 2 . T^ipM) + 13 . ,-^(p(«'))-i < 13 . ^^(p("')). 

So we have that x G Rel(p(t(;)). As was proved above, this means that p{w) will be visited, and 
since x is added to Rel(p(?«)), the algorithm also visits the children of p{w), and in particular, w. 
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